What’s the link between pee in a swimming pool and Big Data?

How many times have you heard infinite buzzwords around using this new cool tech that’ll enable you to reveal the data goldmine you’re already sitting on? That applying Big Data / Artificial Intelligence / Machine Learning / Deep Learning / and so on, will solve all your data challenges?

When we hear that you can just pour everything into a ‘data lake‘ we start to think the hype may have gone too far. We don’t even know how to take pee out of a swimming pool, so why would you just dump data into one big swamp and expect gold to come out of it?

In Reality…

In reality it turns out you need more than the hype and clichés that come with all the big data concepts – there are no silver bullets. None of these technologies come as an install and click on a button package and you do not just apply AI overnight. There is work to be done and money to be invested. It requires domain expertise, infrastructure, skills, and a plan as to what it is you would like to look for.

But most importantly, you need your data to be structured in a form and quality that makes sense before you can apply any data science ninja tricks in a meaningful way. Once you have all your data structured, including all the relevant meta data, only then you’ll be able to take the next step and decide on what technology or approach that’s right for you.

Ensure coherency and structure from the start

So wouldn’t it make sense to collect the data in a sensible and structured fashion already when it’s produced in the lab? That allows for subsequent data science discussions to be around the right tool for the relevant questions instead of discussions – and a lot of time – wasted on infinite data collection, cleaning, and merging.

This might all sound very boring and old fashioned – but no matter the tool, technology, or company, the learnings from we need to analyse data across xyz projects almost always reports that the data preparation step took longer and was more complicated than expected, due to the lack of coherency and structure.

In other words

Despite what those trying to sell you the hyped technologies may claim, there are no shortcuts or novel technologies that’ll fix if your data isn’t structured; not in a good quality; or lacking meta data – right from the very beginning. Only then can you better utilize your data assets and bring the data into better decision making in a meaningful way.

Now, whether a perfectly structured data set will allow you to take pee out of a swimming pool, that’s an entirely different question. But if you do figure that out, please let us and the world know!