Programatic SEO (Pt 1): How to Do Keyword Research at Scale


When I think SEO I think of two varieties.  One is content first - basically producing medium to long form content around SEO that is informational and topically related to what your target audience wants to learn.  Think Hubspot or Intercom.  This is what most of the content is written about in the SEO world.  The other is large-scale programatic SEO, typically focused around scalably creating landing pages that rank for transactional intent.

In this series of posts, we're going to drill down and give a no-BS, step by step process about how to do the more programatic sort of SEO.  Most of this variety you'll see in consumer facing aggregators like Tripadvisor, Yelp, and Bankrate, but it's also pretty common in E-Commerce and there are a couple B2B examples such as Software Advice.

This variety of SEO is much less about producing long-form authoritative content that educates and builds brand, but rather about creating high volumes unique, user-friendly landing pages targeted at transactional terms.

Here's what we'll cover in this series.

Part 1 - Large scale keyword research

How do to keyword research at scale, and use that to guide your landing page creation.

Part 2 - Competitive analysis

No, this isn't the same rehashed crap showing you how to use Ahrefs or SEMRush.  In fact it has nothing to do with link profile at all.

This is about using data to determine who the big players are and to discover the whitespace in your vertical.

Part 3 - Creating landing pages at scale

Here we'll dive into how to create a landing page for each searcher intent.

Part 4 - Building links

It's not so easy to build links to your money pages in programatic SEO.  This is how you get around that.

Part 5 - Dealing with inventory: the technical issues you need to watch out for

In B2B SaaS you can get a long way with a simple wordpress theme.  With programatic SEO, there are a number of things to watch out for, technically speaking.

Large Scale Keyword Research

Keyword research is nothing new, but here we're going to apply the same concepts at scale.  This section assumes a basic understanding and intuition around keyword research.

1) Find your head terms

In nearly every sort of programatic SEO, there are what we call head terms.  These are the broad level categories you'll be trying to rank for.  Here are some examples:

  • Yelp: Restaraunts, Gyms, Yoga Studios
  • Bankrate: Credit cards, bank accounts
  • Zillow: Real estate, homes for sale
  • Tripadvisor: Hotels, Things to Do
  • JC Penny: T-Shirts, Jeans, Polo Shirts

Typically head terms contain a great deal of search volume, but are also often searched with modifiers (more on this later).  Additionally, some head terms may be a parent or child of other terms.

To get a sense of the relative volume, you can use Google Trends, or a keyword research tool.  I like Google trends at a glance, which also shows seasonality.











It's important that you flesh out as many head terms that you can think of.  Start by thinking of as many as possible.  In the vacation rental space, you can try the following:

  • vacation rentals
  • beach house rentals
  • beach houses
  • vacation homes

Then, try and flesh out the list as much as possible.  Include singular and plurals separately.   It doesn't matter whether something has volume or not - we'll deal with that later.  Here are some common, well known tools / methods flesh out those head terms:

  • Ubersuggest
  • Keywords Everywhere
  • Google Adwords Planner
  • Ahrefs, SEMRush, Moz, ect
  • People also search for

Additionally, if you already have competitors in the space, look at their category pages and the keywords they use in their title tags.  Add any terms that stand out to the list.

Finally, search for some of the keywords (feel free to add a modifier), and see what title tags are being shown.  Pay very close attention to the bolded text as Google clearly sees that as a closely related term.

2) Figure out your modifiers

While most of the head terms will have a ton of search volume, it's quite likely that the real volume comes with the head term in conjunction with one or more modifiers.

Now, we'll use the same tools and methods that are listed above to figure out all of our modifiers.

Primary vs secondary modifiers

It's important to at least have a best guess at what your primary and secondary modifiers are.  To explain this, here are some examples:

Primary Modifiers:

  • Shirts: v neck shirts, button down shirts, dress shirts
  • Credit cards: rewards credit cards, travel credit cards, cash back credit cards
  • Restaurants: Thai restaurants, Mexican restaurants, fine dining restaurants
  • House cleaner: Philadelphia house cleaner, New York house cleaner, house cleaner in Lancaster PA

Secondary Modifiers:

  • Shirts: best shirts, comfortable shirts, affordable shirts,
  • Credit cards: best credit cards, credit cards for low credit
  • Restaurants: best restaurants, closest restaurants, cheapest restaurants

Primary modifiers tend to indicate a whole new category.  Thai restaurants is a category in and of itself.  Additionally, they are usually mutually exclusive.  You probably won't find many Thai Mexican restaurants.

Secondary modifiers can either modify the head term, or the head term + a primary modifier.  For example, you could easily imagine people searching best Thai restaraunts or affordable buttondown shirts.  Generally speaking, you don't need to worry about getting volume for or tracking keywords with secondary modifiers.

Local: the easy case

If your business is targeting local intent, it's pretty easy to figure out your modifiers as they will likely be [head term] + [location].  The nature of your business will probably dictate whether that location is state (or state equivalent if you live outside America), city, or even neighborhood.  Nobody searches for the best Thai Food in Pennsylvania, but they may search for the cheapest car insurance in PA.

If you're city level, be sure you include city and city + state in your list of modifiers (we'll show how to put this all together).

There are plenty of places you can get this data.  The US Census is a good start, but won't have every little town and won't have neighborhoods.  Wikipedia often has neighborhoods, and typically every incorporated and unincorporated city / town within a county.  Nextdoor has neighborhoods, but often they aren't the names people search.  Use a virtual assistant to find as many as possible.

Also, you will often find a great deal of volume for [head term] + "near me".  Usually Google picks up on this.

3) Putting it all together with Python

Now, we'll want to go ahead and put our modifiers together to get a large list of keywords.  If you're at a large scale, it's not unreasonable to be tracking 10-200k keywords, but you can still get some good insights with as little as 2,000.

Our goal here is to get every possible permutation of keywords & modifiers.  In most cases, you can probably skip the secondary modifiers as Google tends to show the same pages for modified terms as not, though there are exceptions.

For illustrative purposes, here is a script focused on the vacation home rental space.  In reality, there would be far more cities and states, and probably more head terms.  Additionally, you'd probably want to import and export CSVs, but you get the point.

As you can see, when you run this, you get an exhaustive list of keyword options.

Then, just paste that into Excel using the text import functionality, and you've got a comprehensive list of keywords.  Additionally, you have effectively created keyword tags.  One tag is your head term, the other is your location.

Things to be careful of

Are modifiers universal or specific to certain head terms?

Sometimes you may have modifiers that don't make sense for every one of your head terms.

In the finance space, for example, the modifier "travel" goes well with "credit cards", but doesn't go well with "checking accounts".  We'd recommend simply creating seperate sets of modifiers for each base term.

Include secondary modifiers?

Including secondary modifiers can be helpful, but is not necessary.  If you do, be sure to make sure you have a column in your output file for the modifier.

Don't know python?

It's pretty easy to do things like this!  And there are plenty of places to learn.  It comes pre-installed on Macs, and you can use this exact code as a starting point.

Pre-fixes vs suffixes

In many cases, you might have to classify modifiers as pre-fixes or suffixes, and go from there.  For example "rewards credit cards" has a pre-fix modifier, but "credit cards for travel" is a suffix.

4) Getting search volume

Yes Rand Fishkin, we know the Adwords tool doesn't give accurate data, but we don't need accurate.  What we need is directional data that we can use for comparison.

Most tools - including the Adwords planner - have a pretty low keyword limit.  So, you'll want to have a virtual assistant block these into chunks and run it, OR just use Keyword Keg which allows for large scale batches at a pretty reasonable cost.

Once you do that, now you have a list of keywords with search volume, but you need to reconnect your tagged list of keywords.  Perfect job for a VLOOKUP.  If you're dataset is getting too large for Excel (~50k rows) or you're data geeks like us, you can also upload to SQL tables and join.

Now you have a pretty comprehensive set of keywords, tagged with head terms and modifiers, with search volume appended so that you can pivot to gain all sorts of valuable insights.

5) Visualizing search volume

Now that we have a giant data-set of keywords and volume, let's play around and see what we can learn.  We'll be using a dummy data-set focused on dog related services.  See our head terms and locations (kept it to Philadelphia, Pittsburgh, and Columbus for simplicity).

And the script to put it together:

We used Keyword Keg to get all the volume, joined the keywords back to their head terms, and here we go.  First, let's check out which head term tends to have the most search volume.  The chart below groups all of the search volume by head term.  So volume for keywords 'philadelphia dog boarding', 'dog boarding pittsburgh pa', and 'columbus oh dog boarding' will all roll up into the 'dog boarding' category.

What can we see here?  A lot, actually.  For one, dog boarding dwarfs the rest of the terms.  Additionally, there are plenty of terms almost not worth paying attention to, or at least rolling up into other categories - everything from dog walking services onward.  One super interesting thing is that 'dog walkers' has slightly more volume than dog walker. , and 'dog sitters' has nearly as much volume as 'dog sitter'.  This implies that people are searching for lists, or are in the mode to comparison shop.  That's a hint that we might want to design our landing pages around a comparison shopping experience.  It also means I'd bet money that Rover is kicking Wag's ass in the SERPs, which we'll find out in the next post.  Finally, the head term 'doggy daycare' has a sizable chunk of volume, so let's be sure to sprinkle that keyword in.

Since Google generally interprets plurals pretty well, let's group the singular and plural together.  We'll also be risky and assume Google interprets 'doggy daycare' and 'dog daycare' the same.  We haven't done anything to prove this yet, but hey, why not live life on the edge once in awhile?

Now that's a nice manageable set of head terms.  Now you may be asking, shouldn't we do more grouping?  Such as 'dog boarding' and 'pet boarding'.  Sure, we could shoot from the hip, do a handful of searches and make that judgement call.  But wouldn't we rather use data?  We'll do that in the next post.  But for now, let's nerd out on data and see what else we can learn.

Here's a view showing the record count by search volume.  Shocker, Google is telling us that most terms have zero search volume.

Of course, we know big G is full of shit, so we'll just exclude and move on.

Here's a great chart for visualizing large scale keyword research.  It's known as the Paretto chart.  The blue line shows the volume for each city.  The red line shows the cumulative contribution for the total.  The way to interpret this, is that the top 4 cities in the Columbus metro area account for 77.7% of the total volume.  This might inform where you focus your efforts on.

It's quite common that some head terms are super concentrated in the head, while others are a total long tail.  Do keep in mind that because Google shows lots of long tail as zero, the effects may be skewed towards the head.

Finally, what is the most search pattern people use?  Is it city + state + keyword, keyword + city, or something else?

 The data is showing that 'keyword + city' and 'city + keyword' are the most common types.  Google is pretty good about interpreting intent these days so this isn't something to sweat, but even in 2018, sprinkling the right phrasing in can get you itty-bitty wins.


Now we've created a pretty exhaustive list of potential keywords.  They all have search volume attached, and we've learned a thing or two about how people searched.

In the next post, we'll really get into the meat of it, as we analyze the actual SERPs to do an assessment of the competitive landscape.  Want us to let you know when that comes out?  Be a doll and join our mailing list below.


Read more