HOWTO Let Google Blog Search Access Your Full RSS Feed

If you have a popular blog that generates income through advertising, chances are you offer an RSS feed that contains only an exceprt of each entry. This is a good way to encourage readers to visit the blog to continue reading. In the similar interest of drawing traffic to the site, you should also want to make the best use of Google’s new Blog Search function (info). Unfortunately, Google currently uses only RSS feeds (rather than readable pages) to index weblogs, meaning in many cases only a portion of each entry is searchable.

Here are a few steps to allow Movable Type to generate two RSS feeds—one with excerpts for the public and one with full entries for Google—and allow them to be accessible via an identical URI.

This makes use of .htaccess and mod_rewrite, and I’ve tested it with Linux and Apache Server.



Step The First: Ensure You Have Two RSS Templates

Here are two samples you can use. Add these (or replace the others) as Index Templates in your blog settings.

Template Name: Public RSS 2.0 Excerpts
Output File: index.xml


xmlns:sy=”http://purl.org/rss/1.0/modules/syndication/” xmlns:admin=”http://webns.net/mvcb/”
xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:content=”http://purl.org/rss/1.0/modules/content/”>


<$MTBlogURL$>
<$MTBlogDescription remove_html="1" encode_xml="1"$>
en-us
<$MTEntryAuthorDisplayName$>
Copyright <$MTDate format="%Y">
<$MTEntryDate format="%Y-%m-%dT%H:%M:%S"$><$MTBlogTimezone$>
<$MTEntryAuthorEmail$>”/>
hourly
1
2000-01-01T12:00+00:00



<$MTEntryLink encode_xml="1"$>
<$MTEntryExcerpt remove_html="1" encode_xml="1"$>
<$MTEntryID$>@<$MTBlogURL$>
<$MTEntryBody remove_html="1" words="100" encode_xml="1"$>
This is an excerpt.


<$MTEntryDate format="%Y-%m-%dT%H:%M:%S"$><$MTBlogTimezone$>
<$MTEntryAuthorDisplayName$>



Note: In the above exceprt entry, I have chosen 100 words for the excerpt. If you would like to modify the size, look for words=”100” and change as you see fit. I don’t use the MTEntryExcerpt tag here because it is used as a shorter excert for other functions.

Template Name: Private RSS 2.0 Full Entries
Output File: index-private.xml


xmlns:sy=”http://purl.org/rss/1.0/modules/syndication/” xmlns:admin=”http://webns.net/mvcb/”
xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:content=”http://purl.org/rss/1.0/modules/content/”>


<$MTBlogURL$>
<$MTBlogDescription remove_html="1" encode_xml="1"$>
en-us
<$MTEntryAuthorEmail$>
Copyright <$MTDate format="%Y">
<$MTEntryDate format="%Y-%m-%dT%H:%M:%S"$><$MTBlogTimezone$>
<$MTEntryAuthorEmail$>”/>
hourly
1
2000-01-01T12:00+00:00



<$MTEntryLink encode_xml="1"$>
<$MTEntryExcerpt remove_html="1" encode_xml="1"$>
<$MTEntryID$>@<$MTBlogURL$>
<$MTEntryBody encode_xml="1"$>
<$MTEntryMore encode_xml="1"$>

#comments” title=”Comment on: <$MTEntryTitle$>”>Comments (<$MTEntryCommentCount$>).


]]>
<$MTEntryCategory remove_html="1" encode_xml="1"$>
<$MTEntryDate format="%Y-%m-%dT%H:%M:%S"$><$MTBlogTimezone$>
<$MTEntryAuthorDisplayName$>



Step The Second: Create or Edit .htaccess

This is where the magic happens. Create a module template. It should include the following at the top, before any redirects to FeedBurner or other modifications. (Thanks to Nickel for a reminder.)

Template Name: htaccess
Link this template to a file: .htaccess (Note! Remember to include the period before the name of the file.)

RewriteEngine on

RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
RewriteCond %{HTTP_USER_AGENT} Technoratibot
RewriteRule ^index\.xml$ http://www.yourdomainname.com/index-private.xml [R,L]

Note: I added a line in there which redirects requests from Technorati as well, but I’m not sure if they crawl RSS feeds.

Step The Third:

Make sure your HTML index templates all contain this line in the section:

Did you enjoy this article? If so, please share!
Email this article to a friend Email this article to a friend | Add to: Tip'd | Facebook | Delicious | Reddit | Digg

Get the RSS feed or enter your email address:

Scroll down to read 9 comments on “HOWTO Let Google Blog Search Access Your Full RSS Feed.”

Related Entries on Consumerism Commentary

9 Comments on “HOWTO Let Google Blog Search Access Your Full RSS Feed.” To add your own comment, scroll down.

  1. Flexo
    Comment #1 on Thursday, September 15, 2005
    9:08 pm (reply)

    Another tip: change “30” in the line (in the private RSS template) to another number to give Google more entries. I don’t know if there’s a benefit, but it couldn’t hurt.

  2. Jonathan@MyMoneyBlog
    Comment #2 on Thursday, September 15, 2005
    10:27 pm (reply)

    Great idea, I’m assuming other people can’t read your .htaccess files?

  3. Flexo
    Comment #3 on Thursday, September 15, 2005
    10:30 pm (reply)

    That’s correct, the .htaccess file cannot be viewed over the web.

  4. fivecentnickel.com
    Comment #4 on Friday, September 16, 2005
    12:47 am (reply)

    Is there a good way of testing this out to make sure that it works once you set it up? For example, is there some way of spoofing the googlebot user agent to see if your conditional redirect works?

  5. Flexo
    Comment #5 on Friday, September 16, 2005
    1:05 am (reply)

    There is an extension for Mozilla and Firefox that allows you to spoof any browser software. Just tested it out here… it works! :>

  6. Irishwonder.syndk8.co.uk
    Comment #6 on Friday, September 16, 2005
    8:26 am (reply)
  7. fivecentnickel.com
    Comment #7 on Friday, September 16, 2005
    10:33 am (reply)

    I couldn’t get it to work. How did you configure the extension? I just put Googlebot and for the description and user agent and left the other fields blank.

  8. Flexo
    Comment #8 on Friday, September 16, 2005
    4:16 pm (reply)

    I answered this by email. If you have User Agent Switcher and you want to spoof Googlebot, use these settings:

    Description: Googlebot
    User Agent: Googlebot/2.1
    App name: Googlebot
    Version: 2.1

    Everything else can be left blank.

  9. fivecentnickel.com
    Comment #9 on Friday, September 16, 2005
    6:52 pm (reply)

    Okay, it works. Just be sure to put the conditional redirect ahead of any other redirects of feed requests (such as redirecting to FeedBurner). Otherwise the request will get shunted off to FeedBurner before the Googlebot redirect gets evaluated. Thanks!

Welcome to Consumerism Commentary

Consumerism Commentary is a blog for men and women who wish to make the most of their financial lives. Read more about Consumerism Commentary.


Cash Loans

Advertise on Consumerism Commentary

FNBO Direct

Recent Comments

Best of Consumerism Commentary

Recent Articles

Recent Topics on C3 Forums

Popular on pfblogs.org

Subscribe via E-mail

Tip'd
Click here to start saving with ING DIRECT!

Disclaimer

The authors of Consumerism Commentary are not professional financial advisers and no text within this website should be considered financial advice. Any individual who makes financial decisions based solely on the information contained within does so at his or her own risk. Always consult a financial professional.

About Advertising

This website contains advertisements, usually listed as “sponsors.” Some links are for products or services for which Consumerism Commentary is an "affiliate." No articles within the blog are advertisements disguised as blog entries. Consumerism Commentary is not compensated for any content, except for advertising sold. This site contains no Pay-Per-Post (or similar) articles.

Privacy Policy

Carnival of Personal Finance