This blog moved back to the main Céondo Ltd blog, within its own section. The content will be slowly reimported into the main blog. You can subscribe to the feed here.
It makes more sense to have everthing in one place.
To content | To menu | To search
Wednesday 26 August 2009
By Loïc d'Anterroches on Wednesday 26 August 2009, 11:27 - Analytics
This blog moved back to the main Céondo Ltd blog, within its own section. The content will be slowly reimported into the main blog. You can subscribe to the feed here.
It makes more sense to have everthing in one place.
Monday 10 August 2009
By Loïc d'Anterroches on Monday 10 August 2009, 11:49 - Analytics
As explained in my trial to track unique visitors on my website, I am doing an analysis of the access to the robots.txt file, but I am also doing a simple user agent string analysis. The following function determines if a visitor is a bot or not:
function isBot($user_agent)
{
static $bots = array('robot',
'checker', 'crawl', 'discovery',
'hunter', 'scanner', 'spider', 'sucker', 'larbin',
'slurp', 'libwww', 'lwp', 'yandex', 'netcraft',
'wget');
static $pbots =
array('/bot[\s_+:,\.\;\/\\\-]/i',
'/[\s_+:,\.\;\/\\\-]bot/i',
'/^\s*Mozilla\/\d\.0\s*$/i');
foreach ($bots as $r) {
if (false
!== stristr($user_agent, $r)) {
return true;
}
}
foreach ($pbots as $p) {
if
(preg_match($p, $user_agent)) {
return true;
}
}
if (false === strpos($user_agent,
'(')) {
return
true;
}
return false;
}
It is working pretty well at the moment, I am improving it on a regular
basis based on my logs. Maybe a more rigorous approach based on publicly
available bot name data would be even better.
Friday 7 August 2009
By Loïc d'Anterroches on Friday 7 August 2009, 14:55 - Analytics
Now that I have a fairly robust way to track the unique visitors of my website, I need to explore a way to use that. A said earlier, I want to perform some split testing.
In short, you split you user base randomly in 2 or more groups and you provide each group with a given version of your page or website. You also define some conversion goals, for example accessing a given page and then you track which group has the best conversion rate.
If you want to learn more, Jesse Farmer has a good introduction to A/B testing.
You will need to define:
Of course, you need to be able to track the individual users accessing your website and as such you need a good unique user tracking method.
The test case should store:
The conversion goal:
The alternatives:
For the treatments, you need to think a bit more further than A/B testing and go into multivariable testing. Multivariable testing is the fact that instead of testing one or another alternative (A or B), you test several alternatives on the same page/website at the same time. It is basically an extended version of the A/B testing method. You can consider it as a huge A/B test with several alternatives based on all the combination of elements.
Imagine you have a webpage with a banner and a title (2 slots, one banner and one title) and you want to test 2 different banners B1 and B2 and 3 different titles T1 to T3. This means you have 6 possible treatments: B1T1, B1T2, B1T3, B2T1, B2T2 and B2T3.
You could avoid the definition of the slots by creating directly the 6 alternatives, but that would take some manual time and you will lose the ability to do a bit of statistical kung fu and take a shortcut with the number of tests to run to get some meaningful results.
So, a treatment should store:
Now you will need to deliver the treatment to the users and log which treatment has been given to whom and what was the outcome.
The treatment log should store:
The conversion log should store:
This structure allows us to track different goals at the same time, this can possibly save time.
The workflow is relatively simple, you basically do the following:
On each page, check if an active goal is matched, if so, log the conversion success for this visitor.
What is important is to never send 2 different treatments for a given user.
Implement that to do some funny testing on the homepage of indefero.net and report you the results.
By Loïc d'Anterroches on Friday 7 August 2009, 14:01 - Analytics
So, now that I have a full day of records, I can now start to get some statistics from my unique visitor tracking test. To get the number of unique visitors coming on the website yesterday, I just run:
SELECT COUNT(*) FROM
(SELECT DISTINCT "user" FROM indefero_idfa_tracker_logs
LEFT JOIN indefero_idfa_tracker_users
ON indefero_idfa_tracker_users.id="user"
WHERE bot IS FALSE
AND
DATE(indefero_idfa_tracker_logs.creation_dtime) = CURRENT_DATE - 1
GROUP BY "user")
AS foo;
This gives me 151 unique visitors. Doing a small check of
the indefero_idfa_tracker_users table, I found 2 bots, so
149 unique visitors. Now, let's go and ask Google Analytics
and I get 120 absolute unique visitors, a 20%
difference.
Note that the tracking is running on indefero.net, which is a website targeted towards geeks, I think that for this demographic group a 20% difference is not that big, this means that only 20% are blocking the GA tracker code. Anyway, I am really happy with the results and this means that the tracking is working well. My backend is PostgreSQL, with MySQL you may need to adapt your query for the date operation.
Bonus: Changing bot IS FALSE by bot IS TRUE gives me 60 bots and crawlers.
Thursday 6 August 2009
By Loïc d'Anterroches on Thursday 6 August 2009, 13:14 - Analytics
So the unique visitor tracking test is running. At the moment of writing, I have 159 unique visitors in my visitor table. From an excerpt of the results shown below, it is clear that I need to flag the bots and crawler and exclude them from the page tracking.
id | User agent
82 | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_7; en-us) AppleWeb
[...]
83 | Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/530.5
[...]
84 | msnbot/2.0b (+http://search.msn.com/msnbot.htm)
85 | DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1;
[...]
86 | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.2)
Gec[...]
id | Visitor | Page
341 | 83 | /
342 | 83 | /tour.html
343 | 56 | /refund/
344 | 56 | /privacy/
345 | 84 | /robots.txt
346 | 84 |
/
347 | 85 | /robots.txt
348 | 85 | /
349 | 84 | /refund/
The good thing is that the robots and crawler are good Internet citizens, as you can see for the MSN bot with the id 84, they are always requesting the robots.txt file for the first request. This means that one can directly flag a new visitor as a bot if the first action is to grab the robots.txt file.
Now, this will kick out most of the bots and crawler but not these ones:
70 | Mozilla/5.0
A very minimal user agent string.
303 | 70 | /doc.html//?_SERVER[DOCUMENT_ROOT]=http://www.[...]
304 | 70 | /
305 | 70 | //?_SERVER[DOCUMENT_ROOT]=http://www.[...]
306 | 70 | /doc.html//?_SERVER[DOCUMENT_ROOT]=http://www.[...]
307 | 70 | /
308 | 70 | /doc.html//?_SERVER[DOCUMENT_ROOT]=http://www.[...]
309 | 70 | //?_SERVER[DOCUMENT_ROOT]=http://www.[...]
310 | 70 | /
And looking to trash my site. I am already not logging the ones without a user agent string, but it looks like I will need to use the heuristics of AWStats to mark more of the visitors as bot.
/robots.txt as
crawler/bot.I am going to work on that this afternoon and will report to you the results.
Wednesday 5 August 2009
By Loïc d'Anterroches on Wednesday 5 August 2009, 23:41 - Analytics
In my previous post, I wrote about unique user session tracking, now, here is what I ended up creating to implement that in practice. This approach is undergoing tests by tracking the unique visitors on www.indefero.net. I will then cross check the results with the Google Analytics data of the account to assess the quality of the idea.
The storage is composed of 2 tables, one for the visitors and one for the logs. The visitor table is needed as the goal is to track in realtime the unique visitors. To mitigate the need to lookup data in this visitor table, information is cached using Memcached.
The visitor table stores:
The log table store:
To find a visitor in the visitor table, I first search by cookie and if not available by user agent/IP address combination. The real trick is the handling of the missing cookie. In my case, I log just before sending the response, this means that if this is a new visitor or a visitor without cookie, I have a new cookie. When doing the check for the visitor in the table, if the user agent/IP matches but not the cookie, I update the cookie in the table. This is because I have no idea if the visitor will now accept the cookie or not. This could be a performance problem.
Basically, I first perform a cookie check and then I default on the user agent/IP address combination. This is running at the moment on indefero.net (only the presentation website, not the hosted forges) and I will compare the results with the Google Analytics resuts in 24 or 48h. What is already better than GA is that I can see the bots. Maybe I should add a bot flag in the visitor table to easily exclude them when doing reports.
By Loïc d'Anterroches on Wednesday 5 August 2009, 10:05 - Analytics
Goal of the day (or maybe months): 300% increase of my conversion rates.
How to do that: Split testing.
What is needed: Track the unique user sessions of the website in real time.
So, how do you track the unique visitors on your website? I must say, it looks like black magic. I took the time to read the code of AWStats but was not able to understand it as both my fluent Perl is far away in the past and the code is completely written with speed in mind and not concept understanding.
So, Wikipedia on the web analytics page is providing this information:
[A unique user is] an IP address plus a further identifier. Sites may use User Agent, Cookie and/or Registration ID.
Good, so, it means that if I want to track my users, I need to use the IP address (easy), a user agent (easy), a cookie (not so easy) or a registration id (not possible in my case).
Cookies are optional. With Firefox, I have an extension to disable all the cookies but for the websites I trust.
So, if you consider that you need a distinct pair (cookie, ip) to have a unique user, then, each page I access on your website will count a new unique user.
Yes, I need to test it and the solution is to implement it and compare with what gives me Google Analytics.
A unique user session is a combination of:
This approach means that if I do not have a unique cookie and if I have a set of users coming from the same connection with the same browser, it will get counted as a unique user.
Is it a problem? Not really. Why? Because the goal is to perform split testing, so the goal is more to have the minimum number of unique user and to be able to at least mark 50% of them for the split test. So as long as I can get a good fraction of the users with the cookie, I will be happy.
Here are more ideas to explore the tracking without cookies.
I am a PHP shop, but you can do it in any language. What you need is simply a database and a fast in memory storage (APC or memcached).
The fast memory storage is to avoid hitting the database at each request and the database is of course to get a bit of persistence. The memory storage expires the value after your desired session time (30 minutes), this automatically takes care of the active session length handling.
The workflow is as follow for a non cookied visit:
For a user with a cookie:
The tracking must be performed in real time. This is why it is not possible to use the referrer information to follow the path of the user and to dissociate the users accessing the website with same IP/Agent. Anyway, it looks like no single solution will be the optimal but only something like an adaptive algorithm which can give a probability of "uniqueness" of a hit based of compounded methods.
Tuesday 4 August 2009
By Loïc d'Anterroches on Tuesday 4 August 2009, 20:53
This is a blog about Céondo Ltd, Céondo is a small software vendor run by myself, Loïc d'Anterroches. I have decided to dive more into the ISV dark side: marketing.
Marketing is something I am not good at, really not good. But I know I am not completely stupid, so I decided to learn and I will share with you my findings. The goal is to have a better understanding of the basics of marketing, web site conversion optimisation and viral marketing by the end of 2009.
How I am going to achieve that? Simple, instead of coding 9h30 a day and doing 30 minutes marketing, I am going to code 5h and market 5h.