Over the years, I have mentored and trained lots of individuals from different fields, to become data scientists.
While I’ve often preached that you don’t need prior experience to become a data scientist, I’ve found the most interesting students to be those who have taken some sort of training before – usually online classes (MOOCs) or crash courses.
They are people who are looking to transition into a data scientist career. Why? Because a data scientist career while is a lucrative one, has tons of perks to it, including career future-proofing. But I’ll reserve that for another post.
Back to students and individuals who are looking to become data scientists. I’ve found many has good concepts about being a data scientist, but more often than not, there’s a bunch of misconceptions that prevent them to being a data scientist, especially lost opportunities.
If you’re looking to become a data scientist and is clueless about it, don’t worry. I have experienced the same doubts and misconceptions that you had, so I’d like to explain them to you in this post.
Misconception #1 – Data Comes In A Clean Format
Most data scientist newbies think data gathering is a simple task, where data usually come in a clean format. Just download data from the sources and plug and play, done – right?
Unfortunately, that is never the case.
In fact, the bulk of a data scientist’s time, up to 85% in fact, is spent cleaning data. That may sound like an exaggeration, but if you consider the time you spend looking for accurate data, asking around and trying to make sense o the data, 85% of your time seems like a right number.
Back then, I had to figure out how to download over 16GB worth of raw tweets without breaking my university’s network. Imagine cleaning over 50 million text data with vague field headers like A1, B2, ZYC, TMO, etc. And guess what, there are no such things as a documentation when it comes to data gathering and cleansing.
Lesson learnt: Data gathering and cleansing will take up 85% of your time as a data scientist. Be prepared for that.
Misconception #2 – Choosing Cool Algorithms Make You Appear Smarter
One of my favourite thing while I was doing my PhD, was a weekly session, where I would have to share my weekly report. Being the person who loves technology, I’d presents results to my supervisors and explain the latest algorithm that I have implemented.
However, that does not warrant a PhD.
I had to spend extra time and hours to explain, why the Hans Linear Allocation model is better than LDA. Eventually, I felt all the time spent wasn’t worth it. So I took a step back and focused on working the data using existing models, getting much better performance.
From the beginning, I should have spent more time understanding the data and its quality, rather than looking for the next coolest algorithm.
Lesson learnt: Always choose good data over cool algorithms, all the time.
Misconception #3 – Forget About Visualization
No matter how good your algorithm, model and data is – if you can’t show it on a chart that your audience can understand, then your findings, algorithms and models are as good as useless.
Many students who I’ve taught, often put little effort into mastering visualization. After all, data scientists are supposed to work on data, not visualizations, right?
Well, visualizations are in fact, the meaning to all your effort. This rule is applied to all ages of your audience, whether if you’re presenting your data to an 8 -year kid, to a professor or a CEO. In fact, you’ll be surprised to find even professors cannot comprehend things beyond a bar chart.
And please, no pie charts.
Lesson learnt: In data science, visualizations rule, period.
What are other data science stories or things you wish you knew before you became a data scientist?
If you’re still early on in your data science journey, feel free to run thoughts with me before you move further into your data career. I’ll be happy to help!
p/s: If you want to learn more about data science and begin a career in data science, you should join the 48-hour data science Bootcamp.