AdSense crawler errors: Check your robots.txt file for improved ad targeting and relevancy

Thursday, November 17, 2011 | 9:00:00 AM

Labels:

This is the first post in our two-part AdSense Crawler Errors series.

There are many ways that publishers can go about optimizing their site for AdSense; opting-in to text/image ads, upgrading to our preferred ad formats, and increasing ad coverage across a site are just a few of the more well-known ones. But did you know that there’s another straightforward optimization tip that many publishers often overlook?

A bit of context
Your site’s robots.txt file essentially acts as a gatekeeper that determines which web crawlers, web robots, and search engines have access to your site and which do not. Those that are granted permission can do things like view your pages and index your site. Those that don’t have permission are not able to view or index specific sections of your site, depending on what you’ve specified.

AdSense ads are displayed through the use of an AdSense web crawler. That crawler scans your page’s content and determines which ads to display, according to specific keywords. If our AdSense crawler is being blocked by your robots.txt file, we’re going to have a difficult time displaying relevant ads on your site. As a result, your users may see less relevant ads, which can lead to a lower CTR.

How you can help yourself

View the contents of your robots.txt file by going to [yourdomain.com]/robots.txt.  (If you have a subdomain, it likely has a robots.txt file as well, located at [sub.yourdomain.com]/robots.txt.) Be sure that the file is configured to allow our AdSense ad crawler to view your site. You can do that by simply adding the following two lines to the very top of the file:
User-agent: Mediapartners-Google
Disallow:
This will ensure that our AdSense ad crawler can access your site and will help display more relevant ads. As a result, you can potentially benefit from increased ad revenue. Please note that making this change will not impact your Google search rankings. Adding these two lines to your robots.txt file will only help to deliver better, more relevant ads to pages with AdSense code already on them. Pages that don’t have AdSense ad code will not be affected.

If you have URLs with any errors, you can see what they are by logging into your AdSense account and clicking on ‘Account Settings’ from the home page. From there, click on ‘View errors’ under ‘Access and Authorization.’
Stay tuned for the second post in our AdSense Crawler Errors series, where we’ll cover crawler login issues and how you can solve them.

Posted by Andrew Boni - Inside AdSense Team

22 comments :

mplx said...

If you search the web you'll find a lot of recommodations for

User-agent: Mediapartners-Google*
Disallow:

In theory (according to robotstxt.org) it's not allowed to use * as wildcard, it's allowed just alone as "User-agent: *" meaning "any bot".

Is there a practical difference - would Google release a bot with something added to the string "Mediapartners-Google"? And would the current Google bot still recognize it as intended?

Do you suggest to add or remove the wildcard?

David Portney said...

Hmm, so are you saying that the default behavior for your AdSense web crawler is to NOT crawl a site, and the robots.txt addition you suggest is "required" in order for the AdSense bot to crawl effectively, or crawl at all?

ionebeck said...

here's the info I need, I'm happy
to come here and see info about
adsense

aa said...

Adsense team should better cooperate with other google teams especially social networks for example orkut:


Since last week due to some issue on robots.txt in orkut.com.br google adsense crawler has troubles reaching applications pages on the site.

Which causes less targeted ads and less revenue for developers and for google overall.


Since many applications and games are hosted on Orkut and in the future will be hosted maybe on G+ it's about time for adsense team to instruct their other teams to allow adsense crawler to crawl these sites (i.e. update robots.txt which under their control) so app developers who build on top of google social networks could benefit from adsense.

Cotalika said...

I get the crawlers in my Google Adsense account, whether it's the danger in my account?

And the cause of crawler’s errors is from where?

Thank you.

Shishir said...

Sir/ Ma'am,

My robots.txt file reads
User-agent: *
Disallow: /
Is this ok..??
Because Google Webmaster says that it is blocking the crawler's to crawl to my site...
Please suggest some remedy..!!!

The Best Scent Store said...

Good stuff I have learned a lot from reading these comments thanks everyone

Daniel Mitchell said...

How does this apply to blogger (blogspot) blogs?

QWIJIBO said...

Hi, I have a crawler error problem, with a blocked URL and the reason for it being blocked is robots.txt file. When I went to my (domain).blogspot.com/robots.txt, it had:

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search
Allow: /


Which is what it's supposed to be, right? So what is causing the crawler error and how do I fix it? The site is not showing up on adsense. Let me know what to do next, thanks in advance!

Jon Robet said...

thank you for article

how robots.txt have a function for an ads

Phyllis Segura said...

Apparently ads show up on my blog but I do not see them on my computer. Why not? How can I see and select them? And how can I find your response to my question?

RabinsXP.com said...

Thanks for the help. Now I have adjust the robots.txt. Hope my CTR will increase.
Thanks to Inside Google Team for help.

Fashion Industry said...

Some of the Crawl Errors that appear in my report are NOT pages on my site.

knowledgebook said...

why my website shows crawler error in webmaster tools.my website is www.kashmirimage.in

5abiraag said...

thanks a lot for this useful post. I recently got adsense message for my 5abi Raag and 5abi Raag . Adsense said that you have 43 crawling errors in 10 days. Thanks for inform me.

regards
gagan masoun

Gus said...

None of the crawl errors shown are for our site. http://tvtropes.org

Gus said...

By the way, you sent us another message on this today. It also contains no blocks on urls on our site. For what is worth, our robots.txt has looked like the recommended text for many, many years. You guys have a bug.

Joe Siegrist said...

There appears to be an issue with how you're interpreting Robot.txt errors -- I run xmarks.com but this shows as one of my crawl errors:

e.g. translate.yandex.net/tr-url/en-ru.ru/www.xmarks.com/site/www.lastpass.com

I don't control yandex.net, this is a translation site on yandex, yet this shows as a crawl error for me despite that fact because it's doing translation, and my ads are showing on the translated page.

I'm not sure what the right answer here is -- perhaps ignoring yandex.net based errors is the answer for now.

Kenya Job Tube said...

I received email from google yesterday about Large Number of Failed Google AdSense Ad Crawls ...We noticed that our AdSense crawler is having difficulty crawling some parts of your site... Can someone help me on how to fix this on blogger? where do i add the two lines?

User-agent: Mediapartners-Google
Disallow:

admin said...

I am wondering how my robots.txt (which have those two lines at the top from years) could block Mediapartners crawler when the reported pages with crawler access errors are in fact image search pages of Bing.com

Cheers!

Sethisto said...

How does one modify this in blogger? You guys keep sending me messages about this, but I have no way to access my robots.txt file.

Sam Azgor said...

how to fix it ?