As featured in The Wall Street Journal, Money Magazine, and more!
     

HOWTO Let Google Blog Search Access Your Full RSS Feed

This article was written by in Internet. 16 comments.


If you have a popular blog that generates income through advertising, chances are you offer an RSS feed that contains only an exceprt of each entry. This is a good way to encourage readers to visit the blog to continue reading. In the similar interest of drawing traffic to the site, you should also want to make the best use of Google’s new Blog Search function (info). Unfortunately, Google currently uses only RSS feeds (rather than readable pages) to index weblogs, meaning in many cases only a portion of each entry is searchable.

Here are a few steps to allow Movable Type to generate two RSS feeds — one with excerpts for the public and one with full entries for Google — and allow them to be accessible via an identical URI.

This makes use of .htaccess and mod_rewrite, and I’ve tested it with Linux and Apache Server.

Step The First: Ensure You Have Two RSS Templates

Here are two samples you can use. Add these (or replace the others) as Index Templates in your blog settings.

Template Name: Public RSS 2.0 Excerpts
Output File: index.xml

<?xml version=”1.0″ encoding=”iso-8859-1″?>

<rss version=”2.0″ xmlns:dc=”http://purl.org/dc/elements/1.1/”
xmlns:sy=”http://purl.org/rss/1.0/modules/syndication/” xmlns:admin=”http://webns.net/mvcb/”
xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:content=”http://purl.org/rss/1.0/modules/content/”>
<channel>
<title><$MTBlogName remove_html=”1″ encode_xml=”1″$></title>
<link><$MTBlogURL$></link>
<description><$MTBlogDescription remove_html=”1″ encode_xml=”1″$></description>
<language>en-us</language>
<dc:creator><MTEntries lastn=”1″><$MTEntryAuthorDisplayName$></MTEntries></dc:creator>
<dc:rights>Copyright <$MTDate format=”%Y”></dc:rights>
<dc:date><MTEntries lastn=”1″><$MTEntryDate format=”%Y-%m-%dT%H:%M:%S”$><$MTBlogTimezone$></MTEntries></dc:date>
<admin:generatorAgent rdf:resource=”http://www.movabletype.org/?v=<$MTVersion$>” />
<admin:errorReportsTo rdf:resource=”mailto:<MTEntries lastn=”1″><$MTEntryAuthorEmail$></MTEntries>”/>
<sy:updatePeriod>hourly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<sy:updateBase>2000-01-01T12:00+00:00</sy:updateBase>
<MTEntries lastn=”30″>
<item>
<title><$MTEntryTitle remove_html=”1″ encode_xml=”1″$></title>
<link><$MTEntryLink encode_xml=”1″$></link>
<description><$MTEntryExcerpt remove_html=”1″ encode_xml=”1″$></description>
<guid isPermaLink=”false”><$MTEntryID$>@<$MTBlogURL$></guid>
<content:encoded><$MTEntryBody remove_html=”1″ words=”100″ encode_xml=”1″$>
<![CDATA[<p>This is an excerpt.</p><p><a href="<$MTEntryLink$>">Read this full entry and discuss.</a> <MTEntryIfAllowComments><a href=<$MTEntryLink$>#comments" title="Comment on: <$MTEntryTitle$>">(Comments: <$MTEntryCommentCount$>)</a></p>
</MTEntryIfAllowComments>]]></content:encoded>
<dc:subject><$MTEntryCategory remove_html=”1″ encode_xml=”1″$></dc:subject>
<dc:date><$MTEntryDate format=”%Y-%m-%dT%H:%M:%S”$><$MTBlogTimezone$></dc:date>
<dc:creator><$MTEntryAuthorDisplayName$></dc:creator>
</item>
</MTEntries>
</channel>
</rss>

Note: In the above exceprt entry, I have chosen 100 words for the excerpt. If you would like to modify the size, look for words=”100″ and change as you see fit. I don’t use the MTEntryExcerpt tag here because it is used as a shorter excert for other functions.

Template Name: Private RSS 2.0 Full Entries
Output File: index-private.xml

<?xml version=”1.0″ encoding=”iso-8859-1″?>
<rss version=”2.0″ xmlns:dc=”http://purl.org/dc/elements/1.1/”
xmlns:sy=”http://purl.org/rss/1.0/modules/syndication/” xmlns:admin=”http://webns.net/mvcb/”
xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:content=”http://purl.org/rss/1.0/modules/content/”>
<channel>
<title><$MTBlogName remove_html=”1″ encode_xml=”1″$></title>
<link><$MTBlogURL$></link>
<description><$MTBlogDescription remove_html=”1″ encode_xml=”1″$></description>
<language>en-us</language>
<dc:creator><MTEntries lastn=”1″><$MTEntryAuthorEmail$></MTEntries></dc:creator>
<dc:rights>Copyright <$MTDate format=”%Y”></dc:rights>
<dc:date><MTEntries lastn=”1″><$MTEntryDate format=”%Y-%m-%dT%H:%M:%S”$><$MTBlogTimezone$></MTEntries></dc:date>
<admin:generatorAgent rdf:resource=”http://www.movabletype.org/?v=<$MTVersion$>” />
<admin:errorReportsTo rdf:resource=”mailto:<MTEntries lastn=”1″><$MTEntryAuthorEmail$></MTEntries>”/>
<sy:updatePeriod>hourly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<sy:updateBase>2000-01-01T12:00+00:00</sy:updateBase>
<MTEntries lastn=”30″>
<item>
<title><$MTEntryTitle remove_html=”1″ encode_xml=”1″$></title>
<link><$MTEntryLink encode_xml=”1″$></link>
<description><$MTEntryExcerpt remove_html=”1″ encode_xml=”1″$></description>
<guid isPermaLink=”false”><$MTEntryID$>@<$MTBlogURL$></guid>
<content:encoded><$MTEntryBody encode_xml=”1″$>
<$MTEntryMore encode_xml=”1″$>
<![CDATA[
<p><a href="<$MTEntryLink$>">Read this entry and discuss.</a></p><MTEntryIfAllowComments><p><a href="<$MTEntryLink$>#comments" title="Comment on: <$MTEntryTitle$>">Comments (<$MTEntryCommentCount$>).</a></p>
</MTEntryIfAllowComments>]]></content:encoded>
<dc:subject><$MTEntryCategory remove_html=”1″ encode_xml=”1″$></dc:subject>
<dc:date><$MTEntryDate format=”%Y-%m-%dT%H:%M:%S”$><$MTBlogTimezone$></dc:date>
<dc:creator><$MTEntryAuthorDisplayName$></dc:creator>
</item>
</MTEntries>
</channel>
</rss>

Step The Second: Create or Edit .htaccess

This is where the magic happens. Create a module template. It should include the following at the top, before any redirects to FeedBurner or other modifications. (Thanks to Nickel for a reminder.)

Template Name: htaccess
Link this template to a file: .htaccess (Note! Remember to include the period before the name of the file.)

RewriteEngine on

RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
RewriteCond %{HTTP_USER_AGENT} Technoratibot
RewriteRule ^index\.xml$ http://www.yourdomainname.com/index-private.xml [R,L]

Note: I added a line in there which redirects requests from Technorati as well, but I’m not sure if they crawl RSS feeds.

Step The Third:

Make sure your HTML index templates all contain this line in the <head> section:

<link rel=”alternate” type=”application/rss+xml” title=”RSS/XML Syndication Feed” href=”http://www.yourdomainname.com/index.xml” />

Step The Fourth: Rebuild!

When readers subscribe to your blog’s feed, they will see the excerpts. However, when Google crawls your site, entire entries will be indexed. You can also give the direct private URI (with the file name index-private.xml) to friends and trusted readers if you don’t mind them viewing full entries via syndication.

Updated February 6, 2012 and originally published September 15, 2005. If you enjoyed this article, subscribe to the RSS feed or receive daily emails. Follow @ConsumerismComm on Twitter and visit our Facebook page for more updates.

Email Email Print Print
avatar
Points: ♦127,535
Rank: Platinum
About the author

Luke Landes is the founder of Consumerism Commentary. He has been blogging and writing for the internet since 1995 and has been building online communities since 1991. Find out more about Luke Landes and follow him on Twitter. View all articles by .

Read related articles from Consumerism Commentary

{ 8 comments }

avatar Luke Landes ♦127,535 (Platinum)

Another tip: change “30″ in the line <MTEntries lastn=”30″> (in the private RSS template) to another number to give Google more entries. I don’t know if there’s a benefit, but it couldn’t hurt.

avatar Jonathan@MyMoneyBlog

Great idea, I’m assuming other people can’t read your .htaccess files?

avatar Luke Landes ♦127,535 (Platinum)

That’s correct, the .htaccess file cannot be viewed over the web.

avatar fivecentnickel.com

Is there a good way of testing this out to make sure that it works once you set it up? For example, is there some way of spoofing the googlebot user agent to see if your conditional redirect works?

avatar Luke Landes ♦127,535 (Platinum)

There is an extension for Mozilla and Firefox that allows you to spoof any browser software. Just tested it out here… it works! :>

avatar fivecentnickel.com

I couldn’t get it to work. How did you configure the extension? I just put Googlebot and for the description and user agent and left the other fields blank.

avatar Luke Landes ♦127,535 (Platinum)

I answered this by email. If you have User Agent Switcher and you want to spoof Googlebot, use these settings:

Description: Googlebot
User Agent: Googlebot/2.1
App name: Googlebot
Version: 2.1

Everything else can be left blank.

avatar fivecentnickel.com

Okay, it works. Just be sure to put the conditional redirect ahead of any other redirects of feed requests (such as redirecting to FeedBurner). Otherwise the request will get shunted off to FeedBurner before the Googlebot redirect gets evaluated. Thanks!

Previous post:

Next post: