SEO Indexation: Robots.txt, XML Sitemap, Schema.org

Why is indexation crucial?

Before appearing in search results, your site must be discovered, crawled and indexed by Google. The robots.txt and sitemap.xml files guide Google's bots, while structured data helps them understand your content.

Poor configuration of these elements can block indexation of your important pages or dilute your crawl budget on useless pages.

What is robots.txt?

The robots.txt file is a text file placed at your site's root that tells indexing robots (Googlebot, Bingbot, etc.) which pages they can or cannot crawl.

Robots.txt example:

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/

Sitemap: https://www.your-site.com/sitemap.xml

Main directives:

User-agent: Target robot (* = all)
Allow: Allows crawling of a path
Disallow: Forbids crawling of a path
Sitemap: Indicates sitemap location

What is an XML Sitemap?

The XML sitemap is a structured list of all pages on your site that you want indexed. It helps Google discover your pages faster and understand your site structure.

Sitemap structure:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://www.your-site.com/</loc>
    <lastmod>2024-01-15</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
</urlset>

Best practices:

Maximum 50,000 URLs per sitemap
Include only indexable pages
Update automatically after each change
Submit in Google Search Console

What is Schema.org?

Schema.org is a structured data vocabulary that helps search engines understand your page content. It enables rich snippets in Google results.

Recipe: Chocolate Cake

★★★★★

4.8 (234 reviews)

⏱ 45 min • 🍽 8 servings • 285 kcal

JSON-LD example for an article:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Article Title",
  "author": {"@type": "Person", "name": "Author"},
  "datePublished": "2024-01-15"
}
</script>

Common Schema types:

• Article

• Product

• LocalBusiness

• Recipe

• FAQPage

• BreadcrumbList

• Organization

• Person

Control indexation page by page

The robots meta tag lets you control indexation and link following at the page level, offering finer control than robots.txt.

Common examples:

<meta name="robots" content="index, follow"> <meta name="robots" content="noindex, follow"> <meta name="robots" content="noindex, nofollow">

Available directives:

index - Allow indexing

follow - Follow links

noindex - Don't index

nofollow - Don't follow

What our audit checks

Our tool automatically verifies the presence and configuration of all these essential technical elements for your site's indexation.

Robots.txt file present

XML Sitemap accessible

Schema.org structured data

Robots meta tags

Robots.txt/sitemap consistency

JSON-LD validation

Check my site's indexation

Indexation & Technical SEO