Last week Lynchpin - well Andrew Hood to be accurate - spoke at JUMP 2012 in London. The topic was "Big Data - Does Size matter?". Here is a synopsis of what he said (Full presentation PDF below).
If web analytics has always been “big” data are we getting diminishing returns on our accumulation of digital data? To look beyond the spin we need to consider some practical considerations vital to getting business value from this ever increasing “wealth of data”
Maybe we should start with defining "Big Data". This is the best we've seen.
Mark Whitehorn - Chair of Analytics at the University of Dundee.
This definition is balanced by the results of survey of IT professionals who clearly don't have a shared idea of what Big Data is and don't believe it will replace our old RDBMS (Relational Databases).
If we accept Mark's definition then we begin to see that web data is not actually Big Data.
To be Big Data it needs to fulfil 3 criteria these are:
If we accept these criteria then web data does not comply. It can have volume (large sites) and it does have velocity. However, it is clearly structured and hence does not comply!
So, if Big Data is not the answer to getting more value out of our online data then what is?
We need to consider and do the following
And how do we do this?
First, we need to move away from the traditional web analytics tools. As these by there very nature destroy the relationship layers because they are session based.
Secondly, we need to broaden our attribution modelling to include offline customer data.
Thirdly, we need to start using tools to allow us the freedom to deliver all the above. And, guess what? The old trusty Relational Database does exactly that. They are free and powerful.
Finally, we need to turn to statistics. “Big Data” stores do not have magical built-in analytical capabilities (exception: some standardised algorithms for things like fraud detection are emerging). Making sense of data big and small is going to need some established statistical techniques:
So we now are reaching convergence. One common complaint in digital is the struggle to recruit decent “web analysts”. By contrast, there is an established industry of data analysts with skills in: Relational databases and Statistical modelling. If less of our web data was locked up in proprietary data models, those skills suddenly become exceptionally valuable.
So in conclusion: