Advanced search engine optimization

Website structure and content impact search engines’ ability to provide a good search experience. That’s why it is so important for you to use search engine optimization (SEO) best practices to close critical gaps. Some elements on your website can hinder the search experience, and this resource shows you how to amplify valuable content and avoid pitfalls.

Make sure to publish a robots.txt file and XML sitemap to tell search engines what content to index. 

Beyond these two must-have files, there are other key tools you can use to further optimize the content on individual pages:

  • Semantic elements
  • Canonical links
  • Robots meta tags

Use semantic elements

Use the <main> element to target particular content on a page

You can use the <main> element to target content to be indexed by search engines. When a <main> element is present, search engines will only collect the content inside the element. Be sure that the content you want indexed is inside of this element. 

  • If you close the element too early, important content will not be indexed.
  • If you do not use a <main> element demarcating where the primary content of the page is to be found, search engines might pick up repetitive content (such as headers, footers, and sidebars) as part of a page’s content. 

Implement the <main> element as a stand-alone tag:

<body>

Redundantheadercodeandnavigationelements,sidebars,etc.

<main>

<h1>Thisisyourpagetitle</h1>

<p>Thisisthemaintextofyourpage

</main>

Redundantfootercode

Variousscripts,etc.

</body>

If possible, open the <main> tag just ahead of the <H1> for your page title. If you use breadcrumbs on your site, place <main> in between the breadcrumbs and the <H1> so that the repetitive text in the breadcrumb links is not indexed.

Also use other semantic elements

In addition to the <main> element, you can add other semantic elements such as <header>, <nav>, and <footer> to demarcate these page sections for cleaner indexing by search engines.

There are two good reasons to declare the URL for a given page: 

  1. Sites built with a content management system can easily become crawler traps
  2. Dynamic pagination can generate URLs that are unhelpful as search results

Avoid crawler traps

A crawler trap occurs when the search engine falls into a loop of visiting, opening, and discovering pages that seem new, but are variations of existing URLs. These URLs may have appended parameters such as tags, referring pages, tag manager tokens, page numbers, etc. 

Use a canonical link to tell search engines that https://www.example.gov/topic1 is the real URL for the page. 

<link rel="canonical" href="https://www.example.gov/topic1"/>

In this example, even if a search engine discovers other variations of the page like https://example.gov/topic1?sortby=desc, they will only index https://www.example.gov/topic1.

Prevent duplicate results for dynamic pages

Dynamic pagination separates list items into distinct pages and generates URLs like https://anotherexample.gov/resource?page=3. 

Use a canonical link to limit search engines to indexing only the first page of the list. This approach allows users to sort and move through the content more easily. 

Exclude content with robots meta tags

Sometimes it is difficult to exclude content with a robots.txt file. For example, you might not have access to edit this file, or you might have thousands of pages to exclude.

In these situations, you can use meta robots tags to exclude content from being crawled by search engines.

Here are some example meta tags and what they tell search engines. 

Do not to index the page, but follow links on the page

<meta name="robots" content="noindex" />

Index the page, but do not follow links on the page

><meta name="robots" content="nofollow" />

Do not to index the page and do not follow links on the page

<meta name="robots" content="noindex, nofollow" />

What can I do next?