Step-by-step: Storing Medium article information in WordPress Custom Post Type (rather than reading RSS)

Ascend
10 min readAug 24, 2023

--

Photo by Fikret tozak on Unsplash

This article is a continuation from part one, where we read remote information from Medium via the RSS feed before outputting it onto our web page in WordPress.

Store it in WordPress? Let’s start with “why”

Why bother? The code in part one works, we have a feed showing and that should be that, right? Well, not exactly.

  1. If Medium goes down, you may not be able to retrieve data from the RSS feed.
  2. Without creating some advanced logic, there’s no opportunity to easily control the caching of content yourself because image media is streamed directly from Medium.
  3. You may want to not show some articles on your website and can control that manually this way by setting the article status to “draft” within WordPress.

Prerequisites

Same as pervious, there’s some more explanation for each point in article one but we thought it worth recapping here briefly.

  • A WordPress installation
  • Understanding of WordPress coding
  • Some type of IDE/code editor: we’re still using PHPStorm
  • At least PHP 8.0
  • Advanced Custom Fields
  • TailwindCSS — we’ll be using it in any HTMLcode examples.

What’re we going to today?

  1. Register a “Medium” custom post type
  2. Create a scheduled event in WordPress to automate getting our stories from Medium
  3. Using our code from last time, from within the scheduled event, call Medium to get articles using the RSS feed
  4. Store information from Medium against the custom post type
  5. Persist the featured image into our WordPress instance, which will also appear in the Media section
  6. Output our Medium feed to our website using a WP_Query call instead of the RSS feed every time

Creating the custom post type

In our actual code base, this logic is all contained within a plugin. We use the wppb boilerplate but our code in this article won’t make reference to it. Many WordPress developers are still intimidated by, or are unfamiliar with many Object Orientated Programming (OOP) concepts and we don’t want to confuse things or end up down an OOP shaped rabbit hole. (We could go on for days about why wppb is so great. We encourage you to use it).

You can take any of the code below and adapt it for wppb though, easily adding object references and function visibility if you wish. Given most examples of WordPress code (even today) are procedural, we’ll follow suit. We’ll also assume this functionality will be added into the functions.php file.

This logic though definitely belongs in a plugin as the data in your post types will likely outlive your theme. You don’t want to have a future problem of data portability or loss because your custom post types die with your legacy theme.

function register_medium_custom_post_type(): void
{
$labels = [
'name' => 'Medium',
'singular_name' => 'Medium',
'add_new' => 'Add Article',
'add_new_item' => 'Add New Article',
'edit_item' => 'Edit Article',
'new_item' => 'New Article',
'view_item' => 'View Article',
'search_items' => 'Search Articles',
'not_found' => 'No Article found',
'not_found_in_trash' => 'No Articles found in Trash',
'parent_item_colon' => 'Parent Article:',
'menu_name' => 'Medium Articles'
];

register_post_type('medium', [
'labels' => $labels,
'show_ui' => true,
'rewrite' => ['slug' => 'medium'],
'supports' => ['title', 'editor', 'thumbnail', 'excerpt', 'comments'],
'menu_icon' => 'dashicons-rss',
]);
}
add_action('init', 'register_medium_custom_post_type');

If you want an understanding of arguments you can provide to the register_post_type method and what their values/defaults are, you can find more in the WordPress Docs.

Set-up the scheduled event

Scheduled events in WordPress use WP-Cron. This is what WordPress uses to handle scheduling time-based tasks. Several core features, such as checking for updates and publishing scheduled posts utilise this functionality.

The “Cron” part of the name comes from the cron time-based task scheduling system that is available on UNIX systems. That though is where the similarities end. Unlike a typical cron job, WP-Cron does not run constantly; it is only triggered on page load.

This means when we schedule the job below for 02:00, if no pages are loaded on your website until 02:04, that’s when the job will be ran. You can get super technical at server level to override this and be more precise, but we’re not that fussed about that for this article. Maybe another time, huh?

if (!wp_next_scheduled('look_for_new_medium_articles')) {
wp_schedule_event(
(new DateTime())->modify('tomorrow')->setTime(2, 0)->getTimestamp(),
'daily',
'look_for_new_medium_articles'
);
}

What’s the above code doing:

  1. Firstly, we’re setting the schedule to run from 2am tomorrow morning. We only want to look for articles when we have very little traffic.
  2. Secondly, we’re telling the job to run daily thereafter
  3. Thirdly, we’re defining the name of the event we’re scheduling, which is going to be the action we’ll need to hook onto.

If you’re using wppb, this code goes inside of your activator. In your deactivator, you would clear the scheduled task by calling wp_clear_scheduled_hook('look_for_new_medium_articles');

Hook onto the scheduled action & query Medium

function look_for_new_medium_articles_callback(): void
{
$xml = simplexml_load_string(
file_get_contents(
get_field('medium_rss_feed') // https://ascend-agency.medium.com/feed
)
);

...

foreach($xml->channel->item as $item) {

$encodedContent = (string) $item->children($namespaces['content'])->encoded;

/**
* HERE IS WHERE WE ARE GOING TO SAVE CONTENT TO WORDPRESS
*/
}
}
add_action('look_for_new_medium_articles', 'look_for_new_medium_articles_callback')

Inside of the method, we’re going to use the code we created to parse the Medium RSS Feed and create our custom array with back in the last article.

The difference being now, is that we’re going to persist the values into WordPress, rather than hold them in a neater array structure.

Let’s look at how we create a WordPress post programatically:

$postId = wp_insert_post(...);

That wasn’t so bad, was it? The method accepts some arguments, which we’ll come onto in a minute. Full docs are available here if you want to dive deeper.

On success, the method will return the ID of the new post on success or a WP_Error/0 on failure. Unless you say otherwise, the default error return is 0.

These are the attributes we’re going to persist when creating our post.

$postId = wp_insert_post([
'post_author' => 1,
'post_date' => (string) $item->children($namespaces['atom'])->updated,
'post_content' => '', // you could persist $encodedContent here if you wanted
'post_title' => (string) $item->title,
'post_status' => 'publish',
'post_type' => 'medium',
'guid' => (string) $item->guid,
]);

Now we need to save our featured image to disk. We’re going to hold this logic in a separate method though to keep the iterative loop somewhat cleaner.

function save_medium_image_to_wordpress(
string $imageUrl,
int $postId
): array
{
global $wpdb;

// check the image isn't already persisted
$query = $wpdb->prepare(
"SELECT COUNT(*)
FROM $wpdb->posts
WHERE post_type = 'attachment'
AND post_title = %s",
pathinfo($imageUrl, PATHINFO_FILENAME)
);

// if we have no results, let's save it
if(!$wpdb->get_var($query)) {

// get the basename (filename + extension)
$baseName = pathinfo($imageUrl, PATHINFO_BASENAME); // 12345.jpeg

// handle the upload
$contents = file_get_contents($imageUrl);
$upload = wp_upload_bits($baseName, null, $contents);
$wp_filetype = wp_check_filetype(basename($upload['file']), null);

// build the attachment arguments
$attachment = [
'post_mime_type' => $wp_filetype['type'],
'post_title' => preg_replace( '/\.[^.]+$/', '', basename($upload['file'])),
'post_content' => '',
'post_status' => 'inherit',
];

// start upload and attach to the Medium post
if($attachmentId = wp_insert_attachment($attachment, wp_slash($upload['file']), $postId)) {
set_post_thumbnail($postId, $attachmentId);
}
}
}

We’d now be able to call this like so inside of the $item loop:

foreach($xml->channel->item as $item) {

$encodedContent = (string) $item->children($namespaces['content'])->encoded;

$postId = wp_insert_post([
'post_date' => (string) $item->children($namespaces['atom'])->updated,
'post_content' => '', // you could persist $encodedContent here if you wanted
'post_title' => (string) $item->title,
'post_status' => 'publish',
'post_type' => 'medium',
'guid' => (string) $item->guid,
]);

if($postId && ($image = get_featured_image($encodedContent))) {
save_medium_image_to_wordpress($image, $postId);
}
}

Currently though, we’re going to re-save Medium articles over and over again and end up with many of the same stories multiple times. Why? Because we’re not doing any checks before calling wp_insert_post.

We can solve this quickly by using the $item->guid value we’re persisting during insertion as part of a quick database query.

global $wpdb;

foreach($xml->channel->item as $item) {

$query = $wpdb->prepare(
"SELECT COUNT(*) FROM $wpdb->posts
WHERE `guid` = %s
AND `post_type` = 'medium';",
(string) $item->guid
);

if(!$wpdb->get_var($query)) {

$encodedContent = (string) $item->children($namespaces['content'])->encoded;

$postId = wp_insert_post([
'post_date' => (string) $item->children($namespaces['atom'])->updated,
'post_content' => '', // you could persist $encodedContent here if you wanted
'post_title' => (string) $item->title,
'post_status' => 'publish',
'post_type' => 'medium',
'guid' => (string) $item->guid,
]);

if($postId && ($image = get_featured_image($encodedContent))) {
save_medium_image_to_wordpress($image, $postId);
}
}
}

We’ll refactor this loop with logic abstracted to their own functions where appropriate, which should make the code more readable.

function medium_article_exists_in_wordpress(string $guid): ?string
{
global $wpdb;

$query = $wpdb->prepare(
"SELECT COUNT(*) FROM $wpdb->posts
WHERE `guid` = %s
AND `post_type` = 'medium';",
$guid
);

return $wpdb->get_var($query);
}

function build_medium_post_array_from_item(
object $item,
array $namespaces,
string $content = ''
): array
{
return [
'post_date' => (string) $item->children($namespaces['atom'])->updated,
'post_content' => $content,
'post_title' => (string) $item->title,
'post_status' => 'publish',
'post_type' => 'medium',
'guid' => (string) $item->guid,
];
}

foreach($xml->channel->item as $item) {

if(!medium_article_exists_in_wordpress((string) $item->guid)) {

$encodedContent = (string) $item->children($namespaces['content'])->encoded;

$postId = wp_insert_post(
build_medium_post_array_from_item($item, $namespaces)
);

if($postId && ($image = get_featured_image($encodedContent))) {
save_medium_image_to_wordpress($image, $postId);
}
}
}

Next, we’ll persist some custom data to our new post to custom fields that we’ve created via Advanced Custom Fields.

Persisting to ACF

Persisting data into ACF couldn’t be easier. The only thing you have to be really careful of, if you’re not using precise key names, is that there are no field name collisions. For that reason, our field name is incredible long tail.

Inside of the below condition, from within our foreach loop, we’re going to add in the following update_field call.

if($postId && ($image = get_featured_image($encodedContent))) {
save_medium_image_to_wordpress($image, $postId);

// save this data into ACF
update_field('medium_url', (string) $item->link, $postId);
}

In lieu of providing the actual field name — which would look something like key_xyzabc — WordPress will do the heavy lifting with the pretty key name you set and can see easily from the ACF panel in the dashboard.

There may be other fields in the Medium RSS response that you want to store. If this is the case, simply add via ACF and introduce more update_field(...) calls beneath this one as required.

Summarising the code in one view

Let’s take a look at all this code now in one view since we’ve completed the bulk of the work.

/**
* Save a Medium article image to WordPress
*/
function save_medium_image_to_wordpress(
string $imageUrl,
int $postId
): void
{
global $wpdb;

$query = $wpdb->prepare(
"SELECT COUNT(*)
FROM $wpdb->posts
WHERE post_type = 'attachment'
AND post_title = %s",
pathinfo($imageUrl, PATHINFO_FILENAME)
);

if(!$wpdb->get_var($query)) {

$baseName = pathinfo($imageUrl, PATHINFO_BASENAME); // 12345.jpeg

$contents = file_get_contents($imageUrl);
$upload = wp_upload_bits($baseName, null, $contents);
$wp_filetype = wp_check_filetype(basename($upload['file']), null);

$attachment = [
'post_mime_type' => $wp_filetype['type'],
'post_title' => preg_replace( '/\.[^.]+$/', '', basename($upload['file'])),
'post_content' => '',
'post_status' => 'inherit',
];

if($attachmentId = wp_insert_attachment($attachment, wp_slash($upload['file']), $postId)) {
set_post_thumbnail($postId, $attachmentId);
}
}
}

/**
* Get a featured image from inside of the Medium article encoded content
*/
function get_featured_image(string $encodedContent): string
{
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($encodedContent);
libxml_clear_errors();
$xpath = new DOMXPath($doc);
$src = $xpath->query('//img/@src');

return $src[0]?->nodeValue ?? '';
}

/**
* Check if a Medium article GUID already exists in the database
*/
function medium_article_exists_in_wordpress(string $guid): ?string
{
global $wpdb;

$query = $wpdb->prepare(
"SELECT COUNT(*) FROM $wpdb->posts
WHERE `guid` = %s
AND `post_type` = 'medium';",
$guid
);

return $wpdb->get_var($query);
}

/**
* Build an array to use to create a new Medium post in WordPress
*/
function build_medium_post_array_from_item(
object $item,
array $namespaces,
string $content = ''
): array
{
return [
'post_date' => (string) $item->children($namespaces['atom'])->updated,
'post_content' => $content,
'post_title' => (string) $item->title,
'post_status' => 'publish',
'post_type' => 'medium',
'guid' => (string) $item->guid,
];
}

/**
* Process our RSS feed...
*/
$xml = simplexml_load_string(
file_get_contents(
get_field('medium_rss_feed') // https://ascend-agency.medium.com/feed
)
);

$namespaces = $xml->getNamespaces(true);
foreach ($namespaces as $prefix => $ns) {
$xml->registerXPathNamespace($prefix, $ns);
}

foreach($xml->channel->item as $item) {
if(!medium_article_exists_in_wordpress((string) $item->guid)) {

$encodedContent = (string) $item->children($namespaces['content'])->encoded;

$postId = wp_insert_post(
build_medium_post_array_from_item($item, $namespaces)
);

if($postId && ($image = get_featured_image($encodedContent))) {
save_medium_image_to_wordpress($image, $postId);

update_field('medium_url', (string) $item->link, $postId);
}
}
}

You may feel — like we do — that there’s a lot going on, and you’re not wrong. This though is where assuming an object orientated approach to coding makes life easier as we can have dedicated classes to look after certain functionality: writing images and storing posts etc being resigned to Services or Utilities rather than procedural functions that lack greater context, structure or grouping.

Now let’s call our articles using WP_Query

Something a bit more straight forward than what we’ve been doing up to this point. It’s hopefully more familiar to most.

$query = new WP_Query([
'post_type' => 'medium',
'post_status' => 'publish',
'posts_per_page' => 3
]);

if($query->have_posts()) : while($query->have_posts()) : $query->the_post();

// HTML processing will go here

endwhile; endif; wp_reset_query();

The greatest advantage to this approach is that because we are now inside of the WordPress loop, we can now use native WordPress methods to get content instead of being reliant on an array structure that lacks documentation.

<?php
$query = new WP_Query([
'post_type' => 'medium',
'post_status' => 'publish',
'posts_per_page' => 3
]);

if($query->have_posts()) : while($query->have_posts()) : $query->the_post();
?>
<article class="relative isolate flex flex-col justify-end overflow-hidden rounded-2xl bg-neutral-900 px-8 pb-8 pt-80 sm:pt-48 lg:pt-80">
<?php the_post_thumbnail('full', ['class' => 'absolute inset-0 -z-10 h-full w-full object-cover']); ?>
<div class="absolute inset-0 -z-10 bg-gradient-to-t from-neutral-900 via-neutral-900/40"></div>
<div class="absolute inset-0 -z-10 rounded-2xl ring-1 ring-inset ring-neutral-300/10"></div>
<div class="flex flex-wrap items-center gap-y-1 overflow-hidden text-sm leading-6 text-neutral-300">
<time datetime="<?php echo ($time = get_the_time('U')); ?>" class="sr-only">
<?php echo human_time_diff($time, current_time('timestamp')); ?> ago
</time>
</div>
<h3 class="mt-4 text-lg font-semibold leading-6 text-white">
<?php
the_title(
sprintf(
'<a href="%s" target="_blank"><span class="absolute inset-0"></span>',
get_permalink()
),
'</a>'
);
?>
</a>
</h3>
</article>
<?php
endwhile; endif; wp_reset_query();

And there you have it. Medium article information retrieved, processed and stored within your local WordPress infrastructure, from their RSS feed, giving you a platform more fluent processing in your templates and control with database and cache policies.

A little advantage is that you can set any Medium article persisted into WordPress to the draft status if you don’t want it to show on the website!

--

--

Ascend
Ascend

Written by Ascend

We're Ascend - a digital transformation agency - and in just a few short years we have defied expectations and are emerging as a true leader in our sector.

No responses yet