As explained in my trial to track unique visitors on my website, I am doing an analysis of the access to the robots.txt file, but I am also doing a simple user agent string analysis. The following function determines if a visitor is a bot or not:

    function isBot($user_agent)
    {
        static $bots = array('robot', 'checker', 'crawl', 'discovery',
                             'hunter', 'scanner', 'spider', 'sucker', 'larbin',
                             'slurp', 'libwww', 'lwp', 'yandex', 'netcraft',
                             'wget');
        static $pbots = array('/bot[\s_+:,\.\;\/\\\-]/i',
                              '/[\s_+:,\.\;\/\\\-]bot/i',
                              '/^\s*Mozilla\/\d\.0\s*$/i');
        foreach ($bots as $r) {
            if (false !== stristr($user_agent, $r)) {
                return true;
            }
        }
        foreach ($pbots as $p) {
            if (preg_match($p, $user_agent)) {
                return true;
            }
        }
        if (false === strpos($user_agent, '(')) {
            return true;
        }
        return false;
    }

It is working pretty well at the moment, I am improving it on a regular basis based on my logs. Maybe a more rigorous approach based on publicly available bot name data would be even better.