Why am I supposed to become a Data Scientist? – Big Data's Pipeline for beginners

‘Another article about Big Data? Seriously, another one?’


We are tired of reading about Big Data here and there. Everyone talks about it but not all end up specifying what this whole new universe consists of. It’s because Big Data does not only have one final goal. Depending on where you get the large volume of data to analyse or the type of data you are using, we can predict whether a person will suffer from diabetes in the future, to show what you’re looking for before you have missed it (yes, Amazon uses Big Data, like… a lot).



‘All this seems very good to me, but how is it done?’


You are stunning but Big Data is OSUMING. Why are you stunning? As a curiosity, the data scientist role is known in the business world as the blue unicornrole. Personally, I prefer to say we are like detectives, like detective padawans as we are beginners. We have to use our deduction power to find patterns and predict behaviours of the data.


Also, we don’t just do Big Data without knowing what we are doing. There are six simple steps you can follow in order to be successful as soon as possible.

O – Obtain the data.

S/C – Scrubbing/cleaning the data of null or wrong data (known as false positives).

U – Understanding the data in order to find the patterns we were talking about earlier.

M – Modelling the data to be able to use our deduction power.

I – Interpreting the results.

G – Get the success (and the money, we all like that step).


The most time-consuming part is to understand our data. If you really want to find secret patterns or get to the goal, to understand the data should be the most important part. It’s not the same to find out where the better is to install the AC at the wagons of the underground and find out how the economy is evolving. Approximately, the 80% of the time spent will be just understanding. Modelling will be easier (as beginners) with Big Data software as Knime or R. The last one is more programming but nothing you won’t be able to achieve.


Now you have the clues to find out what happened in the Big Data case, take your detective license and make Sherlock be proud of you.

