Batch Processing Data With WordPress via HTTP

For a recent project, we found ourselves in need of a way to verify the integrity of an entire site’s worth of posts after a mistake in the initial migration meant some of the posts in the database were truncated/cut-off.

Our options were limited. Since our host (WP Engine) doesn’t offer shell access and we can’t connect directly to our database, we would have to write a script to accomplish via HTTP requests.

We wanted something that was at least semi-unattended. We’d also need to be careful to avoid exhausting PHP’s memory limit or running up against any PHP or Apache timeout settings.

Given the constraints, processing of posts would have to happen over several HTTP requests.

Enter batchProcessHelper a class that provides ways to accomplish all of the above.

How it works

You can find all the details on usage at the project’s Github page.

batchProcessHelper is meant to be extended. There are two methods you must override: load_data and process_item.

The first method, load_data, is responsible for accessing the data you plan to process and returning an array of serializable data that will be the data queue. The array can consist of objects or associative arrays or be something as simple as a list of IDs or range of numbers. This is entirely up to you.

The only constraints is that load_data must return a data structure that can be serialized for storage in the database as WordPress transient. This is will be your data queue.

The second method, process_item, is where you put the business code to process individual items in the queue.

This method should return true if it succeeds in processing an item, false if it fails.

How to use it

Here’s a super simple example of a class extending batchProcessHelper. You can see and learn how to run the full example on the project Github page.

include_once('batchProcessHelper.php');

class userImportProcess extends batchProcessHelper {

    function load_data() {
        return csv_to_array(ABSPATH . 'example_data.csv');
    }

    function process_item($item) {
        $this->log('Processing item: ' . var_export($item, true));
        return true;
    }
}

This example doesn’t actually do anything. In load_data, we simply load the example data from csv, converting it to an array of associative arrays.

The process_item method logs each of the associative arrays to the batchProcessHelper debug/error log.

However, this code could easily modified to create a new user for each row in example_data.csv.

function process_item($item) {
    $result = wp_insert_user($item);
    if (!is_wp_error($result)) {
        $this->log(‘Successfully created new user: ‘ . $item[‘user_email’]);
        return true;
    }
    return false;
}

Instantiating and running

So, we’ve defined our userImportProcess class, perhaps in a file named userImportProcess.php. How do we run it?

$process = new userImportProcess(array(
    'blog_id' => 99,
    'batch_size' => 10,
    'batch_identifier' => 'User Import’,
    'log_file' => '/path/to/your/file.log'
));

$process->process();

The only requirement is a batch_identifier — a name for your process.

Optionally specify the batch_size. If you don’t, batch_size defaults to 10.

Also, optionally specify a log_file — a path where debug and error messages will be sent. If you don’t specify log_file, batchProcessHelper will try to write one in /tmp.

If you want your process to run in the context of a specific blog, specify a blog_id as well.

Finally, call the process method to start.

Where do I put this stuff?

You’ll need to create a new directory in the root of your WordPress install. It doesn’t matter what it’s called, it just has to be in the root of your WordPress install.

Let’s say you create a directory named wp-scripts. You’ll want to put batchProcessHelper.php and userImportProcess.php in that directory.

Also, if you’re loading data from the filesystem (as in the example above and on the project’s Github page), you’ll want to place the data file in the appropriate location.

Finally

Visit userImportProcess.php in your browser.

If all goes well, you should see a message along the lines:

“Finished processing 10 items. There are 390 items remaining. Processing next batch momentarily…"

At this point, the page will refresh automatically, kicking off work for the next batch.

Once all batches are done, you’ll see the message:

“Finished processing all items.”

It should all go something like this:

Notes

If you have a multisite install, you'll need a Super Admin account to run code that extends batchProcessHelper. If you have a standalone install, you'll need an account with Administrator privileges.

Also, if you plan on keeping your wp-scripts directory around for any length of time, you should consider allowing only a select few IP addresses to access the directory. Refer to the project's README for more information.

How To Migrate A Standalone WordPress Blog To A Multisite Install

Sigh. Database migrations — often nerve-wracking, always tedious, never a task I’ve been fond of. However, as you know, they’re often necessary. For us, they come up frequently.

One of the benefits available to INN members is the option to have their Largo-powered WordPress website hosted on our platform, largoproject.org. As such, we often find we need to migrate their standalone blog to our multisite rig.

To ease the process, I wrote a single_to_multisite_migration command that’s included in our deploy tools repo.

Note: if you’re using WP Engine, our deploy tools include a bunch of helpful goodies. You can read more about them here.

Assumptions

  • You have mysql installed on your computer
  • You’re using INN’s deploy tools with your multisite project
  • The standalone blog uses the standard "wp_" WordPress database prefix

What it takes to migrate a standalone blog

My teammate Adam Schweigert put together this gist that describes the changes required to prepare a standalone blog’s database tables for a new home amongst other blogs in a multisite install.

The process involves renaming all of the blog’s tables from wp_[tablename] to wp_[newblogid]_[tablename]. For example, wp_posts might become wp_53_posts.

This is true for all tables except for wp_users and wp_usermeta. More on this later.

This brings up the question of where the value for new_blog_id comes from. The easiest way to get this value is to create a new blog in your multisite install. By doing this, you’re creating a skeleton of database tables that your standalone blog will fill in.

Step one: create a new blog in your multisite install

After creating a new blog, find it in your Network > Sites list. Hover the site’s link and you’ll see an url in your status bar.

Screen Shot 2014-08-18 at 4.19.28 PM


The “id=53” is the part you want. Make note of the site ID as you’ll need it later.

Step two: retrieving a database dump and running the single_to_multisite_migration command

Let’s look at the signature of the single_to_multisite_migration command:

def single_to_multisite_migration(name=None, new_blog_id=None, ftp_host=None, ftp_user=None, ftp_pass=None):
    ...

The name and new_blog_id parameters are required. The name will be used when creating a database in your local mysql server. This is where we’ll load the single blog’s database dump. It doesn’t matter much what name is, but it should conform to mysql’s rules for identifiers.

The new_blog_id is the ID that you made note of earlier.

If you’re using WP Engine to host the standalone blog, deploy tools can retrieve a recent database dump for you automatically. For this to work, you’ll need to provide your FTP credentials when running single_to_multisite_migration.

Here’s an example:

$ fab single_to_multisite_migration:blogname,53,myinstallname.wpengine.com,ftpusername,ftppassword

If you’re not using WP Engine, you’ll need to get a database dump by some other means. Once you have it, place it in the root directory of your multisite project repo. The single_to_multisite_migration command expects a mysql.sql file in this location, so you may need to rename your dump file to meet this expectation.

After you have the mysql.sql dump in the root of your multisite project repo:

$ fab single_to_multisite_migration:blogname,53
Example of the output from the single_to_multisite_migration command
Example output from the single_to_multisite_migration command

Step three: wait… rejoice! Your multisite_migration.sql file is ready!

Depending how big your standalone blog’s database is, it may take a while for the command to finish.

Message indicating the single_to_multisite_migration command finished properly.
Message indicating the single_to_multisite_migration command finished properly

Step four: apply the multisite_migration.sql file to your multisite database

I leave it up to you to decide how to best to apply the migration file to your database. You may be able to import the sql using phpMyAdmin, or, if you’re using WP Engine, you might contact their lovely support staff and ask them to apply it for you. Be clear that you DO NOT want to drop all tables in your multisite database before importing the multisite_migration.sql file.

Aside from renaming the standalone blog’s tables, what does single_to_multisite_migration do?

Great question. Here’s the short list:

  • Finds the maximum user ID in your multisite database and uses that value to offset the ID’s of users in your standalone blog’s wp_users table so that they can be inserted into the multisite database without duplicate primary key errors.
  • Finds the maximum user meta ID in your multisite database and uses that value to offset the umeta_id's in your standalone blogs wp_usermeta table so that user meta can be inserted into the multisite database without duplicate primary key errors.
  • Retains the siteurl and home values in your multisite “skeleton” site’s wp_[new_blog_id]_options table. This means that you won’t have to (re)set your new multisite blog’s site url and home url values after applying the multisite_migration.sql file.
  • Looks through all of the standalone blog's posts to find "wp-content/uploads/" and replace it with "wp-content/blogs.dir/[new_blog_id]/files/" to help with migrating uploads to the multisite install.
  • Uses REPLACE, UPDATE and some tricky subqueries to insert or update rows. This means you can apply the multisite_migration.sql file to your multisite database and avoid duplicate wp_users and wp_usermeta entries. Helpful if you need to run several incremental migrations to get all of a blog’s content before making the official transition to the multisite install.

Migrating uploads

The other thing you'll need to do is move your standalone blog's uploads directory. The single_to_multisite_migration command doesn't do this for you. You'll have to manually move the contents of the standalone blog's "wp-content/uploads/" to the multisite install "wp-content/blogs.dir/[new_blog_id]/files/" directory.

Test your migrations thoroughly before deploying.

We’ve tested and used this method several times with great success. It works well for us.

That said, remember to thoroughly test migrations before applying them to any production environment.

Break your local development site or your staging server first and be happy about it! Then fix any mistakes and you'll be ready to apply to your production database.