
What is a Robots.txt File? A Simple Guide for Your Website
In this guide, I’m going to show you what a robots.txt file is, how to create a robots.txt file for your website, and what content you should or shouldn’t include in it. I’ll also provide a quick example of a basic robots.txt file that you can easily set up for any simple website and covers the essentials. If you want to learn more about it and how it works, you can continue reading the full tutorial guide.
Quick Example
Open a notepad, enter the following text, save it as robots.txt, and place it at the root of your website. This means the file will appear like this: yourwebsite.com/robots.txt. That’s all you need for a basic setup.
User-agent: *
Disallow:
This is a standard and basic example that meets the minimum requirement for having a robots.txt file on your website. This file tells search engines (or Web Robots) that they are allowed to access all parts of your site, which is ideal for a new or simple website.
What is a robots.txt file?
A robots.txt file is a simple text file that gives instructions to web robots (also called web crawlers, spiders, or bots) about your website. It can contain one or more records, but most websites only need a single record, like the example in the “Quick Example” section above. The main purpose of this file is to control how web robots access and interact with your site.
What are Web Robots?
Web robots come in different types, each with a specific purpose. The most common are search engine bots, but they also include content scrapers, monitoring bots, and social media bots. Here’s a simple breakdown:
There are different types of web robots. The most famous ones are Search Engines i.e Google, Bing, etc. The purpose is not limited to search engines only, there are Content Scrapers, Monitoring Bots, and Social Media Bots too. Stating briefly, explaining each type of web robots below.
- Search Engines: Visit websites, read content, and index it so it can appear in search results.
- Content Scrapers: Copy website content (sometimes for legitimate purposes, but often for spam).
- Monitoring Bots: Track website availability, performance, changes, or SEO metrics.
- Social Media Bots: Fetch link previews, images, and metadata when links are shared.
Now Let’s create and add content to the file.
How to Create a robots.txt File?
To create a robots.txt file, simply open Notepad (or any plain text editor), add the content you want, and save the file as robots.txt. The file name is important, so make sure you double-check it before uploading.
Where to Put robots.txt file?
You should upload the robots.txt file to the root directory of your website. This is the main location where your index
file is stored. Once uploaded, the file URL should look like this: yourwebsite.com/robots.txt
.
Now let’s look at the most common instructions you can add to robots.txt. We’ll explore the syntax and purpose of each option one by one.
Allow All Robots Complete Access
User-agent: *
Disallow:
This gives all web robots full access to your website.
Block All Robots from Accessing the Entire Site
User-agent: *
Disallow: /
This blocks all robots from accessing your entire site. It’s commonly used when a website is still under development or is just a test site, so you don’t want search engines or bots to index it yet. You can also block access to specific folders that are not meant for public view.
Block Robots from Part of the Website
User-agent: *
Disallow: /private/
Disallow: /cgi-bin/
Disallow: /tmp/
This restricts access to specific folders while allowing robots to crawl the rest of your site.
Block Complete Access for a Specific Bot
User-agent: BadBot
Disallow: /
This prevents a particular bot, such as a malicious or unwanted crawler, from accessing your site.
Allow Access for One Specific Bot
User-agent: Google
Disallow:
User-agent: *
Disallow: /
This allows a specific bot (like Googlebot) to access your site while blocking all other bots.
Sitemaps in a robots.txt File
Another common instruction to include in your robots.txt file is the link to your main sitemap. Here’s an example:
User-agent: *
Disallow:
Sitemap: https://yourwebsite.com/sitemap.xml
Including your sitemap in the robots.txt is not required, but it’s highly recommended. It helps search engines discover your sitemaps more quickly and efficiently, especially if your website has multiple sitemaps.
For example, if you have several sitemaps, you can list them all like this:
User-agent: *
Disallow:
Sitemap: https://yourwebsite.com/sitemap.xml
Sitemap: https://yourwebsite.com/sitemap-authors.xml
Sitemap: https://yourwebsite.com/sitemap-products.xml
This ensures that search engines can easily find all parts of your website and index your content properly.
Tip: Test Your robots.txt File
After creating and uploading your complete robots.txt file, it’s a good idea to test it to make sure all your rules are working correctly. You can use tools like Google Search Console’s robots.txt Tester or simply visit the file in your browser (yourwebsite.com/robots.txt
) to check that it’s accessible and contains the correct instructions.
Testing your file helps ensure that search engines and other bots follow your intended rules and that your sitemaps and folder restrictions are recognized properly.
Conclusion
Thanks for reading! This guide is not a fully detailed technical explanation of how a robots.txt file works. The goal is to provide a simple and informative guide to help you create a basic robots.txt file to get started.
Note: Some bad web bots may still ignore this file. It serves as a guideline, not an enforcement tool. So don’t rely on it to hide pages from the public, it won’t work that way.
You can learn more in detail about robots.txt at robotstxt.org. If you have any questions, feel free to leave them in the comments!
For additional useful guides and tech tips, explore our Tips & Tricks category.