Batch Processing Data With WordPress via HTTP

For a recent project, we found ourselves in need of a way to verify the integrity of an entire site’s worth of posts after a mistake in the initial migration meant some of the posts in the database were truncated/cut-off.

Our options were limited. Since our host (WP Engine) doesn’t offer shell access and we can’t connect directly to our database, we would have to write a script to accomplish via HTTP requests.

We wanted something that was at least semi-unattended. We’d also need to be careful to avoid exhausting PHP’s memory limit or running up against any PHP or Apache timeout settings.

Given the constraints, processing of posts would have to happen over several HTTP requests.

Enter batchProcessHelper a class that provides ways to accomplish all of the above.

How it works

You can find all the details on usage at the project’s Github page.

batchProcessHelper is meant to be extended. There are two methods you must override: load_data and process_item.

The first method, load_data, is responsible for accessing the data you plan to process and returning an array of serializable data that will be the data queue. The array can consist of objects or associative arrays or be something as simple as a list of IDs or range of numbers. This is entirely up to you.

The only constraints is that load_data must return a data structure that can be serialized for storage in the database as WordPress transient. This is will be your data queue.

The second method, process_item, is where you put the business code to process individual items in the queue.

This method should return true if it succeeds in processing an item, false if it fails.

How to use it

Here’s a super simple example of a class extending batchProcessHelper. You can see and learn how to run the full example on the project Github page.

include_once('batchProcessHelper.php');

class userImportProcess extends batchProcessHelper {

    function load_data() {
        return csv_to_array(ABSPATH . 'example_data.csv');
    }

    function process_item($item) {
        $this->log('Processing item: ' . var_export($item, true));
        return true;
    }
}

This example doesn’t actually do anything. In load_data, we simply load the example data from csv, converting it to an array of associative arrays.

The process_item method logs each of the associative arrays to the batchProcessHelper debug/error log.

However, this code could easily modified to create a new user for each row in example_data.csv.

function process_item($item) {
    $result = wp_insert_user($item);
    if (!is_wp_error($result)) {
        $this->log(‘Successfully created new user: ‘ . $item[‘user_email’]);
        return true;
    }
    return false;
}

Instantiating and running

So, we’ve defined our userImportProcess class, perhaps in a file named userImportProcess.php. How do we run it?

$process = new userImportProcess(array(
    'blog_id' => 99,
    'batch_size' => 10,
    'batch_identifier' => 'User Import’,
    'log_file' => '/path/to/your/file.log'
));

$process->process();

The only requirement is a batch_identifier — a name for your process.

Optionally specify the batch_size. If you don’t, batch_size defaults to 10.

Also, optionally specify a log_file — a path where debug and error messages will be sent. If you don’t specify log_file, batchProcessHelper will try to write one in /tmp.

If you want your process to run in the context of a specific blog, specify a blog_id as well.

Finally, call the process method to start.

Where do I put this stuff?

You’ll need to create a new directory in the root of your WordPress install. It doesn’t matter what it’s called, it just has to be in the root of your WordPress install.

Let’s say you create a directory named wp-scripts. You’ll want to put batchProcessHelper.php and userImportProcess.php in that directory.

Also, if you’re loading data from the filesystem (as in the example above and on the project’s Github page), you’ll want to place the data file in the appropriate location.

Finally

Visit userImportProcess.php in your browser.

If all goes well, you should see a message along the lines:

“Finished processing 10 items. There are 390 items remaining. Processing next batch momentarily…"

At this point, the page will refresh automatically, kicking off work for the next batch.

Once all batches are done, you’ll see the message:

“Finished processing all items.”

It should all go something like this:

Notes

If you have a multisite install, you'll need a Super Admin account to run code that extends batchProcessHelper. If you have a standalone install, you'll need an account with Administrator privileges.

Also, if you plan on keeping your wp-scripts directory around for any length of time, you should consider allowing only a select few IP addresses to access the directory. Refer to the project's README for more information.

How To Hold An Event Using Google Hangouts That Anyone Can Attend

After scrambling to find a way to set up a public Google Hangout for our last Book Club meeting, I thought it might be helpful to share my initial frustrations and how I finally managed to set it up.

First, I tried creating a new Google Hangout via Google Plus, inviting the public and saving the URL to return to later. Turns out that Hangouts created this way are ephemeral. After about five minutes with no attendees, the Hangout ends.

So the second thing I tried was adding an event to my Google calendar, since I knew that calendar events with video calls (Google Hangouts) attached are persistent (we use the same Google Hangouts link for our scrum every day). Unfortunately, I found that I was unable to invite the general public to a Google Calendar event.

The third thing I tried, after stumbling upon this documentation, was creating a Google Plus event. Not only was I able to create a public Google Hangout with a persistent link, but this also gives you the added benefit of having a place where people can RSVP and/or find more info about the event.

From what we've been able to tell, you're still limited to ten attendees on video at any one time and 100 concurrent users total in the chat but this at least gives you a link you can share without anyone in advance of the event and a place where people can RSVP.

I created a quick how-to for this so that I don’t have to spend time head-scratching next time I need to schedule an event. You can find the complete how-to in the INN docs repo on Github.

Hopefully, you find it helpful, too.

Spaces or tabs: which will YOU choose?

Source: 1985 McCall's Magazine

What follows is a Very Important Blog Post. Please, take this seriously.

It’s a question that every tech team inevitably debates and, as with many topics in software development, people can be downright religious about their stance on spaces versus tabs.

I’m here to (officially >_<) declare such debates a waste of time.

Be flexible enough to use either, depending on the project.

What’s more important than an individual's preference for spaces or tabs is that the team agree on a style of writing code and stick with it.

In our case, since we do quite a bit of work with WordPress and WordPress’ best practices indicate the use of hard tabs, we decided to use them for our WordPress work.

For any code that we write outside the context of a specific framework/project (e.g., WordPress), we use four spaces. This includes HTML, JS, CSS, Python and, occasionally, PHP without WordPress.

Soon, you'll be able to see the details of our coding style guide. I'll update this post once that's available.

If you’re interested, my personal preference is spaces.

Here’s why: hard tab length is treated differently by different text editors and may be adjusted to suit an individual’s preference. This can mean varying degrees of readability for the same piece of code on different machines. It can also wreak havoc on code written in languages like Python or Coffeescript, where whitespace is, in fact, significant.

For more information on why you should never ever use hard tabs (except sometimes), check out:

For a less one-sided look, check out Tabs versus Spaces: An Eternal Holy War.

This concludes my Very Important Blog Post.

Editor's Note: ...but don't even get us started about using two spaces after a period.

Come Learn With Us! Announcing The News Nerd Book Club

On Web TypographyINN encourages you to Always Be Learning.

To make sure we’re being responsible and accountable professional news nerds, today we're announcing a new monthly book club.

What we’re proposing is pretty simple — let’s read a book each month and then get together to talk about it.

We'll likely focus on books about design, development, team workflow and dynamics (especially remote/distributed teams), but we’re open to suggestions for other topics (we have a hackpad here to collect future book suggestions).

Since we also value learning and building in public we also want to open the book club up to anyone who wants to participate, not just our team and not just INN members. We’re hoping this will also be a way to get to know other people interested in or doing the kind of work we do.

Since we're a remote team we'll likely meet via Google Hangout, at least initially (although if we happen to be in the same city we may meet in person and let others join via hangout).

For September, the first book we'll read will be "On Web Typography" by Jason Santa Maria. You can find out more about the book and purchase it here.

Our team is planning to be in Chicago for the Online News Association Conference (Sept. 25-27) so we're having our first meeting in real life there on Thursday, September 25 at 5:30 CT a short walk from the conference at Dollop Coffee and Tea (345 E Ohio St, Chicago, IL 60611). We'll also work out a way for people to join virtually and will send out details closer to the date.

Please RSVP if you plan to attend so we have a rough count.

As part of that meeting we also plan to have a book exchange so if you're going to be at ONA and have old books sitting around you'd like to swap, plan to bring them along.

Get updates by following @newsnerdbooks on Twitter and we hope to see you in Chicago!

How To Migrate A Standalone WordPress Blog To A Multisite Install

Sigh. Database migrations — often nerve-wracking, always tedious, never a task I’ve been fond of. However, as you know, they’re often necessary. For us, they come up frequently.

One of the benefits available to INN members is the option to have their Largo-powered WordPress website hosted on our platform, largoproject.org. As such, we often find we need to migrate their standalone blog to our multisite rig.

To ease the process, I wrote a single_to_multisite_migration command that’s included in our deploy tools repo.

Note: if you’re using WP Engine, our deploy tools include a bunch of helpful goodies. You can read more about them here.

Assumptions

  • You have mysql installed on your computer
  • You’re using INN’s deploy tools with your multisite project
  • The standalone blog uses the standard "wp_" WordPress database prefix

What it takes to migrate a standalone blog

My teammate Adam Schweigert put together this gist that describes the changes required to prepare a standalone blog’s database tables for a new home amongst other blogs in a multisite install.

The process involves renaming all of the blog’s tables from wp_[tablename] to wp_[newblogid]_[tablename]. For example, wp_posts might become wp_53_posts.

This is true for all tables except for wp_users and wp_usermeta. More on this later.

This brings up the question of where the value for new_blog_id comes from. The easiest way to get this value is to create a new blog in your multisite install. By doing this, you’re creating a skeleton of database tables that your standalone blog will fill in.

Step one: create a new blog in your multisite install

After creating a new blog, find it in your Network > Sites list. Hover the site’s link and you’ll see an url in your status bar.

Screen Shot 2014-08-18 at 4.19.28 PM


The “id=53” is the part you want. Make note of the site ID as you’ll need it later.

Step two: retrieving a database dump and running the single_to_multisite_migration command

Let’s look at the signature of the single_to_multisite_migration command:

def single_to_multisite_migration(name=None, new_blog_id=None, ftp_host=None, ftp_user=None, ftp_pass=None):
    ...

The name and new_blog_id parameters are required. The name will be used when creating a database in your local mysql server. This is where we’ll load the single blog’s database dump. It doesn’t matter much what name is, but it should conform to mysql’s rules for identifiers.

The new_blog_id is the ID that you made note of earlier.

If you’re using WP Engine to host the standalone blog, deploy tools can retrieve a recent database dump for you automatically. For this to work, you’ll need to provide your FTP credentials when running single_to_multisite_migration.

Here’s an example:

$ fab single_to_multisite_migration:blogname,53,myinstallname.wpengine.com,ftpusername,ftppassword

If you’re not using WP Engine, you’ll need to get a database dump by some other means. Once you have it, place it in the root directory of your multisite project repo. The single_to_multisite_migration command expects a mysql.sql file in this location, so you may need to rename your dump file to meet this expectation.

After you have the mysql.sql dump in the root of your multisite project repo:

$ fab single_to_multisite_migration:blogname,53
Example of the output from the single_to_multisite_migration command
Example output from the single_to_multisite_migration command

Step three: wait… rejoice! Your multisite_migration.sql file is ready!

Depending how big your standalone blog’s database is, it may take a while for the command to finish.

Message indicating the single_to_multisite_migration command finished properly.
Message indicating the single_to_multisite_migration command finished properly

Step four: apply the multisite_migration.sql file to your multisite database

I leave it up to you to decide how to best to apply the migration file to your database. You may be able to import the sql using phpMyAdmin, or, if you’re using WP Engine, you might contact their lovely support staff and ask them to apply it for you. Be clear that you DO NOT want to drop all tables in your multisite database before importing the multisite_migration.sql file.

Aside from renaming the standalone blog’s tables, what does single_to_multisite_migration do?

Great question. Here’s the short list:

  • Finds the maximum user ID in your multisite database and uses that value to offset the ID’s of users in your standalone blog’s wp_users table so that they can be inserted into the multisite database without duplicate primary key errors.
  • Finds the maximum user meta ID in your multisite database and uses that value to offset the umeta_id's in your standalone blogs wp_usermeta table so that user meta can be inserted into the multisite database without duplicate primary key errors.
  • Retains the siteurl and home values in your multisite “skeleton” site’s wp_[new_blog_id]_options table. This means that you won’t have to (re)set your new multisite blog’s site url and home url values after applying the multisite_migration.sql file.
  • Looks through all of the standalone blog's posts to find "wp-content/uploads/" and replace it with "wp-content/blogs.dir/[new_blog_id]/files/" to help with migrating uploads to the multisite install.
  • Uses REPLACE, UPDATE and some tricky subqueries to insert or update rows. This means you can apply the multisite_migration.sql file to your multisite database and avoid duplicate wp_users and wp_usermeta entries. Helpful if you need to run several incremental migrations to get all of a blog’s content before making the official transition to the multisite install.

Migrating uploads

The other thing you'll need to do is move your standalone blog's uploads directory. The single_to_multisite_migration command doesn't do this for you. You'll have to manually move the contents of the standalone blog's "wp-content/uploads/" to the multisite install "wp-content/blogs.dir/[new_blog_id]/files/" directory.

Test your migrations thoroughly before deploying.

We’ve tested and used this method several times with great success. It works well for us.

That said, remember to thoroughly test migrations before applying them to any production environment.

Break your local development site or your staging server first and be happy about it! Then fix any mistakes and you'll be ready to apply to your production database.

Responsive Tables

A few weeks back, I spent time researching how best to build responsive tables.

Our requirements were pretty simple. We wanted the ability to load data from Google Drive, the ability to embed the tables using an iframe and a dead simple way of handling (re)publication.

Of course, the other essential requirement for responsive tables is that they work well on a variety of devices. In my opinion, the best solutions to this problem transform your table into an easily scrollable list of data on smaller devices.

To handle the transformation of the table on smaller screens, I started by using jQuery DataTables and writing my own code, but came across Tablesaw.js, which already solved the problem.

For the other requirements, I settled on Tabletop.js to handle loading the data, Pym.js for embedding via iframe and a tiny render script, written in Python, to make generating a ready-to-deploy responsive table quick and easy.

How it works

Once you’ve set up a spreadsheet in Google Drive and grabbed the spreadsheet’s public key (described in the Tabletop.js documentation), clone the responsive-tables repository.

First, you’ll want to install requirements, optionally creating a virtualenv:

$ mkvirtualenv tables
$ pip install -r requirements.txt

If you’re not familiar with Python, pip and virtualenv, read more about them here.

Configuration

Next, you’ll create a config file for your table. You can get started by copying the example:

$ cp config-example.json config.json

Edit config.json, filling in each of the blanks with your information.

The ua_code field is your Google Analytics UA code in the format “UA-XXXXXXXX-X”. If you leave it blank or remove it, Google Analytics will be disabled.

The columns field determines which parts of the data from Google Drive should be included in the table. The data structure defining the columns is an array of arrays. Each array within the array represents a column and contains two items.

The first is the column identifier. This is the lowercase column title stripped of punctuation. For example, if your column title is “Must be 501(c)(3)?” your column identifier would be “mustbe501c3.”

The second is how you’d like the column title to appear in your rendered table. You can stick with the title you used in Google Drive or change it entirely. For example, you could use “Must be 501(c)(3)?” or instead choose “Need 501(c)(3).”

The config.json file for the example looks like:

{
  "ua_code": "UA-20502211-1",
  "title": "Discounts for nonprofits",
  "key": "10yccwbMYeIHdcRQazaNOaHSkpoSa1SUJEtWBfWPsgx0",
  "columns": [
    ["service", "Service"],
    ["whatisit", "What is it?"],
    ["whatsthediscount", "What's the discount?"],
    ["mustbe501c3", "Must be 501(c)(3)?"],
    ["moreinfo", "More info"]
  ]
}

Rendering your table

Once you’ve created a config file, you can render the table by running:

$ ./render.py

This will create a build directory adjacent to render.py and populate it with all of the files required to deploy the table. Note that if a build directory exists, anything within it will be deleted when calling render.py.

Once that’s finished, upload the contents of the build directory to your host of choice.

Other notes

The table’s breakpoint is set to 60em (or 960px). There is currently no simple way to change this value, which is why the README states this rig is best suited for tables with 5-7 columns. We may work out a simple way to adjust this in the future.

Also note, the file assets/responsiveTable.js contains some code that identifies long urls in your table’s data, converts them to actual anchor tags and uses css to truncate their text to avoid ruining the table’s lovely responsiveness. You can adjust the width at which the text is truncated in style.css:

#data a.ellipsis-link {
  display: inline-block;
  max-width: 180px; /* adjust the width of truncated urls */
  white-space: nowrap;
  overflow: hidden;
  text-overflow: ellipsis;
}

You can see an example of a table rendered with our rig here: http://nerds.inn.org/wp-content/uploads/static/discounts/

And an example of that same table embedded via iframe: http://nerds.inn.org/discounts/

View and fork the code here: https://github.com/INN/responsive-tables

Using Fabric for WP Engine WordPress Deployments

One of my first tasks at INN was to familiarize myself with the tools and technology the team currently uses.

What I noticed was that our deployment process was pretty darned manual. We host a number of our WordPress sites on WP Engine and while they allow deployment via git and access via SFTP, they don’t give us shell access to the server. So our deployment process up until this point has involved an FTP client, dragging and dropping files for transfer, maybe picking and choosing parts of our code to deploy based on what we've been hacking on. This approach works, but is fragile and requires a lot more thinking than I'm typically willing to do. Remember, laziness is a virtue.

Requirements

Having a scripted deployment process means that the steps to deploy your code will be the same no matter who has the responsibility. Keeping a repo with your deployment scripts means the process is documented and can be critiqued and iterated upon.

We’re using Fabric because I am familiar with it. By all means, use Capistrano or some other deployment tool. Write a few shell scripts if that’s what you’re comfortable with. Do what you can to put the software to work for you.

A sound deployment strategy meets these criteria:

Extensibility

Want to deploy via git or sftp? Deploy a specific branch? Want to minify and compress assets before deployment? Maybe tag a commit for rollback purposes? Your deployment strategy should be extensible enough to accommodate these things and require little to no input from you. Write a script once and put it to work.

$ fab staging branch:exciting_new_feature deploy

Push button

Type the deployment command into your shell and hit enter.

$ fab staging master deploy

Rollback

If something goes wrong, issue another command to revert to a previous state and deploy.

$ fab staging rollback deploy

Fabric, git-ftp and WPEngine

Fabric is a wonderful Python library that allows you to script away application deployments and remote system administration tasks. It provides a way to wrap up what would otherwise be complex git commands/workflows in simple, easy to remember, fab commands.

Some of the things our fabfile is set up to handle include:

  • Use git-ftp to deploy files to WP Engine via sftp.
  • Check the commit hash of currently-deployed code and automatically tag it is as a rollback point.
  • Allow us to checkout and deploy a specific branch to a specific environment (i.e. master or stable to staging or production).

Why use git-ftp? In our experience, git deployment for WP Engine has been difficult to use and unreliable. Also, WP Engine requires an entire WordPress install as the basis for your repository, which isn’t how most of our repos are structured. Git-ftp allows us to work around these issues.

However, if we want to move to a git push-based deployment strategy or expand our toolkit to work with environments outside of WP Engine, adjusting or adding to our fabfile is trivial.

Usage

If you want to use our deployment tools with your project, you can follow the instructions in the README included in the repo. The README holds all the details regarding software prerequisites and how to install them.

If you’re already familiar with Python, Fabric and have your dev environment set up, you can find the “quick start” steps below.

These tools were meant to be added as a submodule in your project repo. This is based on the way Ryan Mark structured the Chicago Tribune’s deploy tools. To start:

$ git submodule add https://github.com/INN/deploy-tools.git tools

Then copy the files from tools/examples to the root of your project:

$ cp -Rf tools/examples/* ./

Finally, edit `fabfile.py` to use your WPEngine host, username and password:

# Environments
def production():
"""
Work on production environment
"""
env.settings = 'production'
env.hosts = [‘mywpinstall.wpengine.com’, ]
env.user = ‘mywpinstallusername’
env.password = ‘mywpinstallpassword’

After that, add and commit your changes. At this point, you should be able to deploy:

$ fab staging master deploy

If you want to perform a dry run beforehand to see exactly what files will be transferred:

$ fab staging master dry_run deploy

If you want to see a list of all available commands:

$ fab -l

Available commands:

branch                Work on any specified branch.
deploy                Deploy local copy of repository to target environment.
dry_run               Don't transfer files, just output what would happen during a real deployment.
master                Work on development branch.
path                  Specify the project's path on remote server.
production            Work on production environment
rollback              Deploy the most recent rollback point.
stable                Work on stable branch.
staging               Work on staging environment
verify_prerequisites  Checks to make sure you have curl (with ssh) and git-ftp installed, installs them if you do not.

Todos

Future plans include adding commands to handle asset minification and compression.

If you'd like to contribute, you can fork the code here.