We thought we’d done a good job covering all the various facets of large-scale programmatic SEO – keyword research, competitive analysis, creating landing pages, building links and technical issues, but something that kept coming up in comments and feedback was something not very often discussed: testing.

Most marketers are familiar with testing as a concept, but usually you think about landing page and feature tests.  While the topic of SEO testing has been broached by few folks, it still remaining a not-so-talked about subject.  So, we’re going to break it down.

Two types of testing in SEO

Testing things in SEO is far more difficult than with paid acquisition, landing page optimization, or product testing.  That’s simply because Google really doesn’t want anybody to run tests – they’d rather us all live in a fairyland where we ‘just focus on the user’.

There are really two types of testing in SEO, which we’ll explore in this piece.  The first is a the scientific approach, a true A/B test where you split templated pages and measure a statistical difference.  The second is a more artful approach, where you measure before and after.  And finally, we wrap up a piece explaining why often testing should be the last thing on your mind, and why some things aren’t worth testing.  Here we go.

The Scientific Approach: A/B Testing Across URLs

Sites that already generate a lot of traffic and have templated page types can actually run fairly pure A/B tests on things like title tags and landing page layouts.  Engineers from Pinterest and Airbnb have laid out how they have done this.

Here’s how it works.  First, you decide what you want to test across pages – say you want to test putting ‘free estimates’ in your title tags to see if that will entice more users to click.  You first split your pages in half in a manner in which approximately divides the traffic in half.  Then, you make the title tag changes to half of the pages – the test group.  Then, over the following weeks you watch to see if the traffic increases in the test group relative to the other group.

This type of testing is typically done on title tags and meta descriptions, as these are user-driven metrics and you will start to see a difference right away.  Additionally, Google seems to adjust the search results for user related metrics much more quickly than it does relative to linking or changes in content.

One key challenge to this is ensuring that you have chosen your sample size properly.  You’ll want to make sure one group doesn’t have say a single page that accounts for a disproportionate amount of traffic, because that could skew the results.  The folks at Airbnb detailed the statistical approach to account for this if you wanna get nerdy, but we’ll spare you the math talk.

Metrics to Track

Why measure traffic?  Technically, in this sort of test you should be measuring for conversion rates, as often more traffic = less intent, or at least keep an eye on conversion rate to ensure it doesn’t drop.  You can also export Google Search Console data and  measure differences in CTR.  However, we’ve seen time and time again that Google Search Console data isn’t always accurate.  Finally, you could track rank.  However, sometimes you won’t actually see a difference in rank, especially if you’re already in the top few results – click through rate is driving the win.  Additionally, you may not be measuring rank for all the possible keywords that you could be ranking for, so rank tracking can only get you so far (and remember, we’re a rank tracking company saying this).

Our recommendation?  Use traffic or conversions as the primary metric, but also pay attention to the others, as you can certainly learn from them.  We also recommend tracking bounce rate and / or dwell time to learn what factor in the test may be causing the result.  For example, we once ran a test which replaced a form on a page with a large button.  Traffic went up, but conversions went down.  The bounce rate and time on site for the button version were far better, leading us to believe that the bounce rate was driving slightly higher rankings.  However, the difference in conversion wasn’t enough to offset the increase in traffic.

When we setup a testing system for a client, we pull in raw traffic & bounce rate data from Google Analytics and rank data from SerpDB, and put it into a monitoring dashboard like this:

Limitations of Pure A/B Testing

For starters, you need a site that has enough traffic to run a test.  Additionally, you need a site that has templated pages (as is usually the case in programatic SEO).

Additionally type of testing should generally only be employed for sites that have page 1 rankings across the board.  For one, you probably have better things to do if you aren’t there already.  But more importantly, things best tested with this sort of testing are typically user-behavior related – links and other on-page factors can take weeks or months to bake in.

Additionally, this type of testing can’t replace best practices alone. In the Pinterest article, the author writes the following:

“we once noticed that Google Webmaster Tool detected too many duplicate title tags on our board pages. The title tags on board pages were set to be “{board_name} on Pinterest,” and there are many boards created by different users with the same names. The page title tag is known to be an important factor for SEO, so we wondered if keeping the title tags unique would increase traffic. We ran an experiment to reduce duplicate title tags by including Pin counts in the title tag, for instance “{board_name} on Pinterest | ({number} Pins).” But we found that there was no statistically significant change in traffic between the groups.”

However, this explanation reflects a basic misunderstanding of title tags.  Users may not have a preference on which title tag they like better, hence the test results.  However, when Google sees duplicate content across your site, the algorithm is more likely to ding your whole site entirely – not just a subsegment of pages.  Hence why their test showed no delta.  I’m not saying this is definitively the case for Pinterest, but I’d keep the title tag with the number in it because it’s a best practice.

Just because a test doesn’t show a results doesn’t mean you should throw best practices out the window.  More on this in the sections to come.

The Artful Approach: Before and After Testing

Not all SEO testing needs to be or should be purely scientific.  Consider the following questions you might want to ask:

  • Does investing in page speed reduction across my whole site make a difference?
  • Should I prune thin content to keep link equity from spreading too thin?
  • Will several links from a PBN to my money page increase the rankings?
  • Will internal linking between city pages help my rankings overall?
  • How many links to I need to build to get my 2,500 word blog post to rank?

For all of the above hypotheses, running a simple split test won’t help you get to an answer.  That doesn’t mean we can’t test the hypotheses – we just need to be a bit more artful about it.  In many instances, the best you can do is make a change and observe the differences.

Let’s take an example.  In a Moz post, the folks at Pipedrive detailed how they spent a whole lot of time ranking #1 for the high volume term ‘sales management’.  Here is an excerpt:

Hypothesis: We hypothesized that dropping the number of “sales management” occurrences from 48 to 20 and replacing it with terms that have high lexical relevance would improve rankings.  Were we right?  Our organic pageviews increased from nearly 0 to over 5,000 in just over 8 months.

Of course, many growth snobs who tout nothing but testing might scoff at this as N of 1, saying that this could just as easily be coincidence.  Especially when the post described how they spent months building guest post links to their guide.

However, the result followed the cause and it’s what intuition would suggest might happen.  Sometimes in SEO, that’s the best we’re going to get.

One type of the artful approach is what we call the kitchen sink approach.  This is useful when you care more about results in a timeframe than you do the exact factors for getting the results.  In the kitchen sink approach, you look at your goal and you lay out all of the activities you could see doing to get a page to rank.  These might include:

  • Building a ton of links
  • Internally linking from the homepage
  • Putting structure data on the page
  • Putting jumplinks on the page
  • Optimizing the keyword density with Clearscope
  • Adding unique images
  • Improving the UX

Once you’ve identified everything, you do it all.  Then, a few months later, measure the results or not.  You may not know if every thing had an effect, or if some were pointless, but who cares?  Now you have a playbook for getting your stuff to rank.

Do you even need to be running tests?

Running tests is probably one of the last things you should be doing in SEO.  Testing, by nature, is only going to represent incremental gains and will only bring you to local maxima.  Often in SEO there is more low hanging fruit in SEO than getting a 5% lift by changing title tags.

This is especially true for sites that are newer, where link building and content should be the priority.  Or on extremely old authoritative sites that are jacked up from a technical perspective.

The other thing to consider is many of the impactful things in SEO are not testable, yet they are considered best practices and for good reason.  Sometimes you should do things because Google themselves has said to do so, or because studies of numerous websites have proven so, or just because a good healthy dose of common sense and intuition would indicate so.

These include:

  • Beefing up landing pages with UGC
  • Adding all proper schema markup
  • Producing quality content relevant to your niche consistently
  • Ensuring all pages are crawlable
  • Ensuring your highest intent pages are heavily internally linked
  • Making sure keywords and synonyms are used in the right places
  • Ensuring your website loads quickly
  • Generating pages to rank for all possible search intent
  • Building links

All of these are things we know to be impactful as a community.  Many of these we can’t run a true test, but I’d argue you shouldn’t be running tests on any of these.  We know they work with little to no downside risk.

I spoke to an SEO manager of an up and coming aggregator-style marketplace.  He told me “I can’t really justify a budget for link building because I can’t measure the effectiveness like I can with testing.”  That mentality, my friends, is why that website is in the shitter and going to lose to their competitors.  Google intentionally makes it hard to test many factors; that doesn’t mean those factors are not impactful.  Links are the primary example of this, as they take a long time to work, and you never really know if that one link worked or not.  But if you build hundreds of links to a site for years, you’ll know that worked.

This isn’t to say don’t run tests.  It’s just to remind you that there are often things that aren’t testable, yet still impactful and worth considering.

Want more posts like this?  Join our mailing list below.

Don’t worry, we won’t flood your inbox with fluff marketing crap or anything self promotional