Companies doing programmatic SEO typically generate a TON of pages on their website. With all these pages comes a number of technical issues, which we’ll go over here. As with most posts on our blog, we try and avoid rehashing what’s obvious and already covered elsewhere. So we’ll focus exclusively on the challenges unique to large scale, programmatic SEO.
Getting Pages Crawled
If Google can’t find a page, it can’t rank it. And even if you submit an orphaned page manually or via a sitemap, you’re losing out on internal link juice you could be sending to that page.
For starters, make sure that every page on your site is categorized. For a travel site, you probably have Countries, States, Cities and Destination types. Choose whatever the broadest is and create an HTML sitemap. Be sure to create some sort of category in your backend for random pages like About Us, Media Relations, ect. Use one of these broad categories to create a user sitemap that’s linked in the footer.
Retailmenot does this by breaking stores up alphabetically.
This ensures that no matter what, Google can find all of your pages. Categorization also makes it easy to establish bread crumbs.
Of course, you don’t just want Google to find your pages – you want to rank those pages. The sitemap is not designed to replace internal linking – just to make sure no pages get orphaned. Internal linking makes sure that you are sending your hard-earned link juice to the right places.
Though Google has come a long way, page rank is still at the core of the algorithm, and websites pass page rank via internal links.
This is one of those areas where what’s best for the user and what’s best for SEO aren’t quite the same. Here are some best practices, at least SEO-wise.
Using breadcrumbs with your clearly defined hierarchy ensures that every subcategory is linked to its parent categories. Also, using schema markup allows you to tell Google verbatim what the hierarchy of your site is, and makes your result a bit prettier in the SERPs.
Typically you don’t just want to internally link up and down your category hierarchy, but also to pages at the same level of your hierarchy.
Sometimes this is also good for the user,
In other cases, well internal linking is really just for SEO.
There are two ways companies typically go about internal linking in this situation.
The more complex solution is using the product’s use to give the user the best possible recommendation. This is used by sites that are designed to be their own search engines of sort – Yelp, Tripadvisor, and Indeed for example. These sites have data on what the ‘best’ page for the user is, and thus it’s in their best interest to show it to the user – while also funneling link equity to the page. The downside is that some results pages may get orphaned. So, make sure you design the logic so that every page gets linked to some extent.
The second solution is to just use categorization. If you’re like Thumbtack (the example above) and just want to spread out link equity, you can use some sort of categorization to decide which links to show. That could be metro areas or cities within a certain radius for a site targeting locations. This is much simpler and easy to maintain.
Use commercial anchor text
Current best practices are to use commercial anchor text for internal linking. Tell Google what that page is about.
Make your most linked pages link to your money pages
Ideally, your most linked pages should directly link to your most valuable pages. This is why so many home pages link directly to big categories. Relevance also matters here. Make it a practice every quarter to go over the pages with your most external links, and ensure they are linking to money pages.
Minimize clicks from home page
Try and ensure your valuable pages – usually category pages – are no more than two clicks from the homepage. For less valuable pages that still get traffic – such as individual listing pages – try and aim for three or four clicks from the homepage.
Linking out to relevant resources matters
Internal linking isn’t just about passing link equity. You also want to demonstrate authority on topics you are trying to rank for. A page about ‘Travel Spots in Thailand’ that links to individual listings and guides to different cities in Thailand demonstrates more authority than just one page that doesn’t link to any others. It’s sort of the concept of a content hub.
Ensuring Google Likes Your Content
Just because you’ve ensured Google can find your pages with your sitemaps and internal linking doesn’t mean G likes what’s on the page.
The first indication is your indexation rate. Remember the categories you created for each page type? You should also create an XML sitemap for each of these categories (that ideally automatically updates when new pages are created). Then submit those to Google Search Console, and GSC will tell you which of them are indexed. Note that the new search console often lies about individual pages not being indexed, but the indexation rate is still something to monitor.
This indexation rate should be above 90%. If it isn’t, you probably have a problem with some of your pages. More on fixing this later.
Another measure of your page content is the cache date.
When it comes to sites that do well programmatic SEO, most results will have been cached within the last week. However, you should check the cache dates of your competition to get a benchmark. If your time from last cache is significantly above average for your space, that’s a sign your landing page content isn’t good.
Finally, keep tabs on your crawl budget. Generally if it’s going up, that’s a good sign. If it is staying constant, that’s not necessarily a bad thing. If it’s shrinking and the number of pages isn’t, that’s a bad sign. Additionally, you want to make sure the number of pages on your site is no more than 3x your daily crawl budget.
Dealing with Thin Content
If you see signs that Google doesn’t like your content, or you just look at your pages and can tell they are thin, you’ve got a couple options.
First, you can beef up your pages with more content. Read more on that here.
If that’s not an option, you can delete the pages. However, it’s likely the case that these pages are useful, in which case you’ll want to noindex them and likely nofollow any links to those pages.
Indeed does this with all of their individual job pages, as often they are thin, scraped, and have little SEO value.
You may not want to deindex a whole category of pages, however, as they may get traffic. For example, Yelp gets a ton of traffic from people searching for ‘restaraunt + reviews’. In cases like this, you may want to programmatically set logic to noindex the page if it falls below a certain threshold. Our gut feel on this is that if there isn’t at least 3x the content as exists in the template (nav, header, footer, related articles, ect), that page is garbage in Google’s eyes.
For sites that act as search or booking engines, you want the landing pages to have the most up-to-date information for users. For example, you probably don’t want to show a hotel that doesn’t have availability. However, you don’t want to have to request that information every time somebody loads the page. So, ideally you have a caching solution that purges and primes the cache whenever the information changes materially. Talk to your engineering team about that because we’re not going to tackle that topic in this post.
Spreading Link Equity Too Thin
As previously mentioned, page rank is still at the heart of Google’s algorithm. Yes relevance, rank brain, EAT, ect are all factors, but link equity is still a very real thing. The more pages your site has, the more you are spreading that link equity out.
How can you tell if you’re spreading yourself too thin?
Well, for one, if your indexation rate is low, that can be a sign. Especially if it looks like your content is good.
Another thing to make a practice of is measuring your rank on new pages as they are generated. Typically when your pages first get indexed, they’ll settle on a rank within a week or two. If that rank that pages are settling at goes down (for similarly competitive terms), it’s a sign your link equity is getting spread thin. Unfortunately there’s no hard and fast science here, but it’s worth being aware of.
Whether you see the signs of this or not, you can usually prune some pages to help put your link equity to use. Do this by using a crawl tool like screaming frog to get all of your pages. Then download traffic data by landing page, match up the two datasets, and create a paretto chart. If a large portion of your pages are getting zero or very little traffic, it might be worth pruning them via deletion and redirect or via no-indexing.
We’ve all heard of duplicate content before, but it can really become an issue in programmatic SEO. As your site grows larger, often you will have a worse page ranking in the search results than your money page. For example, you may have a transactional page about the best lawn care companies in Portland, OR getting outranked by a blog post about lawn care tips in Portland. Occasionally you might get two slots from this, but often you may be showing the wrong page and getting a lower rank as a result.
First, you need a mechanism to let you know this is happening. For us, that’s pretty easy. When we work with a client, we use SerpDB to write all rankings to a database. Then, we programmatically create a table that maps each keyword to the desired landing page, and join them together. Clean url structure makes this easy.
For example, let’s say we have a url structure of https://www.[domain]/[state]/[city]/[category]. It’s quite easy to map the keyword ‘philadelphia pa home cleaners’ to the url https://www.domain.com/pa/philadelphia/home-cleaners. Make sense?
Then, we setup our data visualization, usually using Looker or Tableau, to have a field that checks to see if the url ranking is the desired url.
Now, when it comes to fixing these, you have a few options. One is to do nothing if you’re ranking and converting well. A likely decision if it’s a one-off situation that doesn’t affect the bottom line much.
If it does affect the bottom line, you can try rewording one of the page to have fewer keywords. You can try sending more link equity to the main page. If both pages have links, you might consider consolidating and using a 301 redirect to send all that link juice to one page.
If it’s systematic, it’s likely that you need to rethink your categories. For example, if you have pages for [city} + home cleaners and [city] + cleaning services, maybe you consolidate those programmatically, 301 redirecting one category to the other in order to preserve link juice.
Finally, it is best to show the most unique content on the pages to the search engines, even if it seems normal to duplicate content for users. This is a big issue for location-based searches. For example, if you create a page for rank for ‘Washington DC hotels’, it might be the case that a hotel located across the river in Arlington, VA might be one of the top hotels. However, if that hotel is appearing on both the Washington DC page and the Arlington, VA page, and maybe even other bordering towns, the page is less unique. One instance of this is hardly problematic, but if your listings or other on-page content is similar across several pages, you are hurting yourself. Idealists might think ‘Google is smart enough to know the difference’ and that you shouldn’t worry about it but as in so many cases, these clowns are wrong. This is a case where old best practices still very much apply and we have seen enough data to know this.
We’re on a mission to de-bullshit-ify the world of SEO. If you want more real content and less fluff, we’d be honored if you joined our mailing list below.