Doku wiki don’t like google

Posted at 5:37 pm in doku wiki, google, seo

It’s a fact. It’s probably workaroundable but, well, it’s not a good surprise.

In fact I was expecting the exact opposite.

The fact

After 3 days from installing DokuWiki on my host my pages wheren’t indexed on google yet.

  • I shall say they reported a problem with robots.txt though.

2 days after installing, after digging a little I found that every page had a robots meta tag containing “noindex, nofollow”. They call it Delayed Indexing. Thanks to good you can disable it from config, otherwise you would have to wait 10 days (!!! that’s the default delay) to have a newly-created or just-update page indexed.

Another day, another jorney

The day Google Webmaster Tools reported the crawler visited again my site. Despite of this nothing indexed yet.

After some more research here’s what I found:

from http://blog.riff.org/2006_08_13_dokuwiki_vs_google, third paragraph:

Nonexistent existing pages

The third problem requires going deeper into HTTP, and think of what happens when a user agent (browser, search engine crawler) requests a non-existent page ? The wiki parses the clean url, finds the page to be missing and, being a Wiki engine, it offers a new page creation dialog. Fine and dandy. However, in that case, Dokuwiki considers it has actually found a page (the “new page” creation one), and returns a HTTP 200 result. To Google, and presumably other search engines as well it means the site is trying to perform spamdexing by answering on content it doesn’t hold (now you know why bots sometimes request absurd-looking URLs from your site or submit irrelevant data to your forms). You’ll learn this when trying to validate your site on Google’s sitemap program. They even have a specific explanation pagefor this problem, along with various server-dependent solutions. In Dokuwiki’s case, though, the solution is simple: this is achieved using the $conf[’send404’] variable in conf/local.php. The new page creation dialog will now be returned along with a 404 status.

The above was true expecially for robots.txt. Doku wiki returned an html file (so something unparsable) and a 200 http header forcing google to parse it even if impossible.

I managed to immediately follow the instruction. In addition I managed to exclude robots.txt from .htaccess rewrite rules.

Here’s how my .htaccess file looks like now:

## Enable this to restrict editing to logged in users only

## You should disable Indexes and MultiViews either here or in the
## global config. Symlinks maybe needed for URL rewriting.
#Options -Indexes -MultiViews +FollowSymLinks

## make sure nobody gets the htaccess files
<Files ~ "^[._]ht">
    Order allow,deny
    Deny from all
    Satisfy All
</Files>

## Uncomment these rules if you want to have nice URLs using
## $conf['rewrite'] = 1 - not needed for rewrite mode 2
#RewriteEngine on
#
## Not all installations will require the following line.  If you do, 
## change "/dokuwiki" to the path to your dokuwiki directory relative
## to your document root.
#RewriteBase /dokuwiki
#
RewriteEngine On
RewriteRule ^_media/(.*)              lib/exe/fetch.php?media=$1  [QSA,L]
RewriteRule ^_detail/(.*)             lib/exe/detail.php?media=$1  [QSA,L]
RewriteRule ^_export/([^/]+)/(.*)     doku.php?do=export_$1&id=$2  [QSA,L]

RewriteRule ^$                        doku.php  [L]
RewriteCond %{REQUEST_URI} !^/robots.txt
RewriteCond %{REQUEST_FILENAME}       !-f
RewriteCond %{REQUEST_FILENAME}       !-d
RewriteRule (.*)                      doku.php?id=$1  [QSA,L]
RewriteRule ^index.php$               doku.php

Will I ever be able to appear on google like everyone other ?

Update

It’s seems google indexed me. It took a month or more to update their index, anyway DokuWiki pages of my site begun to appear on google search.

 

Written by Stefano Forenza on February 22nd, 2007

Leave a Reply