Batch Processing Data With WordPress via HTTP

For a recent project, we found ourselves in need of a way to verify the integrity of an entire site’s worth of posts after a mistake in the initial migration meant some of the posts in the database were truncated/cut-off.

Our options were limited. Since our host (WP Engine) doesn’t offer shell access and we can’t connect directly to our database, we would have to write a script to accomplish via HTTP requests.

We wanted something that was at least semi-unattended. We’d also need to be careful to avoid exhausting PHP’s memory limit or running up against any PHP or Apache timeout settings.

Given the constraints, processing of posts would have to happen over several HTTP requests.

Enter batchProcessHelper a class that provides ways to accomplish all of the above.

How it works

You can find all the details on usage at the project’s Github page.

batchProcessHelper is meant to be extended. There are two methods you must override: load_data and process_item.

The first method, load_data, is responsible for accessing the data you plan to process and returning an array of serializable data that will be the data queue. The array can consist of objects or associative arrays or be something as simple as a list of IDs or range of numbers. This is entirely up to you.

The only constraints is that load_data must return a data structure that can be serialized for storage in the database as WordPress transient. This is will be your data queue.

The second method, process_item, is where you put the business code to process individual items in the queue.

This method should return true if it succeeds in processing an item, false if it fails.

How to use it

Here’s a super simple example of a class extending batchProcessHelper. You can see and learn how to run the full example on the project Github page.

include_once('batchProcessHelper.php');

class userImportProcess extends batchProcessHelper {

    function load_data() {
        return csv_to_array(ABSPATH . 'example_data.csv');
    }

    function process_item($item) {
        $this->log('Processing item: ' . var_export($item, true));
        return true;
    }
}

This example doesn’t actually do anything. In load_data, we simply load the example data from csv, converting it to an array of associative arrays.

The process_item method logs each of the associative arrays to the batchProcessHelper debug/error log.

However, this code could easily modified to create a new user for each row in example_data.csv.

function process_item($item) {
    $result = wp_insert_user($item);
    if (!is_wp_error($result)) {
        $this->log(‘Successfully created new user: ‘ . $item[‘user_email’]);
        return true;
    }
    return false;
}

Instantiating and running

So, we’ve defined our userImportProcess class, perhaps in a file named userImportProcess.php. How do we run it?

$process = new userImportProcess(array(
    'blog_id' => 99,
    'batch_size' => 10,
    'batch_identifier' => 'User Import’,
    'log_file' => '/path/to/your/file.log'
));

$process->process();

The only requirement is a batch_identifier — a name for your process.

Optionally specify the batch_size. If you don’t, batch_size defaults to 10.

Also, optionally specify a log_file — a path where debug and error messages will be sent. If you don’t specify log_file, batchProcessHelper will try to write one in /tmp.

If you want your process to run in the context of a specific blog, specify a blog_id as well.

Finally, call the process method to start.

Where do I put this stuff?

You’ll need to create a new directory in the root of your WordPress install. It doesn’t matter what it’s called, it just has to be in the root of your WordPress install.

Let’s say you create a directory named wp-scripts. You’ll want to put batchProcessHelper.php and userImportProcess.php in that directory.

Also, if you’re loading data from the filesystem (as in the example above and on the project’s Github page), you’ll want to place the data file in the appropriate location.

Finally

Visit userImportProcess.php in your browser.

If all goes well, you should see a message along the lines:

“Finished processing 10 items. There are 390 items remaining. Processing next batch momentarily…"

At this point, the page will refresh automatically, kicking off work for the next batch.

Once all batches are done, you’ll see the message:

“Finished processing all items.”

It should all go something like this:

Notes

If you have a multisite install, you'll need a Super Admin account to run code that extends batchProcessHelper. If you have a standalone install, you'll need an account with Administrator privileges.

Also, if you plan on keeping your wp-scripts directory around for any length of time, you should consider allowing only a select few IP addresses to access the directory. Refer to the project's README for more information.