What is the difference between data science, big data and business intelligence?
Even with data science becoming a huge subject among corporates and businesses, there are still lots of misconceptions going around the subject.
But regardless of your views on data science, there’s one thing we can all agree on: Data Science is here to stay. And the truth is, small or large corporations can all make use of it. If you want to get onboard the digital transformation, I’d suggest looking for a data science course and get yourself data-ready.
In this episode, we talk about:
- Data science vs Big Data vs Business Intelligence
- A role of a data scientist: To understand business or numbers?
- Learning Mathematics for Data Science
- Can smaller organizations start with data science?
Difference Between Big Data & Data Science
Reuben: In today’s’ episode we’re going to talk about misconceptions in data science & big data. Now to be honest Dr. Lau, when I first met you, and you told me about data science, I didn’t understand what data science is. I thought data science is big data and big data was data science. Tell us what is the difference between big data and data science?
Dr. Lau Cher Han: You remember the meme, those one-liners that make your girlfriend angry? So, this is the one-liner that makes data scientists angry. If you tell a data scientist, “Hey, data science and big data are the same”.
So people always have the tendency to interchange the words between big data and data science. And this is totally intolerable. For the sake of this episode, I’m not to go into the details of big data. So, comment below if you want to know more about big data. We will consider doing a separate episode.
So in a layman term, big data is just a term to just describe for a lack of better word – data that has too big of a volume and speed. So quick and the amount is so huge that current technologies, systems, servers, algorithms, and softwares cannot handle. Then we just call it big data. This is what big data actually means.
In contrast, data science is more like the science behind it. So data science deals with everything from data mining, transforming the data, gathering the data, cleaning of data, modeling of data, storing of data and all the way. It goes to the previous episode where we talked about storytelling presentation and more. So to build models, algorithms, and stuff – that is what data science does. So they are two both separate things – big data and data science.
From a more technical point of view, big data is trying to solve a problem that is more infra. The server side, the back-end side, so that storage of the data. Then we use different technologies to speed things up for performance. Data science is probably not a good name, it’s more like art nowadays. It comprises of everything.
Is Business Intelligence The Same As Data Science?
Reuben: Now the second misconception that we always get, is that a lot of people think business intelligence is the same thing as data science. So, what do you think?
Dr. Lau Cher Han: Okay, this one won’t piss data scientists, but perhaps the BI guys. Business intelligence is another subset of data science. If you look at the chart right again this is the Gartner analytics ascendency model.
We have descriptive analytics, diagnostics analytics, predictive all the way to prescriptive. Now the prescriptive part is AI part that people talk about nowadays. So, how we can make things happen.
And descriptive is where BI (Business Intelligence) fall under. So, BI is more on descriptive analytics where we focus a lot more on the reporting side and visualizations But steps before we get to BI like data gathering and data cleansing. These are the things that are always not mentioned. These are the dirty work behind. So that is also categorized under the part or portion of data science and that is where data scientists spend a lot of time on.
And then we talk about modeling and artificial intelligence afterward. Yeah. So, BI is designed in a sense, to look backward. That is why it’s called in hindsight. And its based on the real data and real events.
So, data science is more like looking forward, where we are entering the unchartered territory, to look at things that are unknown. So, in BI, you deliver reports and it helps you fulfill KPIs and trends. But it doesn’t really tell you what this thing might look like in future. So, this is the key difference between data science and BI.
Should Data Scientists Understand Business or Numbers?
Reuben: To add on that question, lots of people think that data scientists are tech guys, and they are good in numbers and but they don’t understand business. What do you have to say for that?
Dr. Lau Cher Han: As a data scientist, you need not be the subject matter expert. So for example, I have been involved in projects that are related to retail, logistics and at some call it the IOT census. I’m definitely not the expert in those fields, compared to people who have been working in the industry for many years. But data scientists, what we need to develop is the sense & ability to get into business as quick as possible. So we must go in, every time we enter into new vertical or a new industry.
I want to be able to understand their lingos, their terminologies and the words that they use. And then from there, only then we can identify the pain points and ask the right questions. Every now and then, people will be talking about answering questions or asking the right questions.
The way we tell a good and great data scientist apart. One of the key things is not to look at how good you are at answering questions but how good you are at asking questions. So I would say you don’t need to be the subject matter expert in that particular business but you definitely need to be able to use data to provide them a fresh perspective so that they can look at problems and innovate from there.
Learning Math For Data Science – A Must?
Reuben: The next misconception that we have. Do I have to be good at Mathematics to be good in data science?
Dr. Lau Cher Han: Okay, nobody would deny the fact that mathematics is the core of data science. But again, you don’t have to be very good at math in order to become a data scientist. I’m the best example, most people of my age, when we learn math, we actually learn by memorizing it. So I always ask my students, what is the point of memorizing A square plus B square equals to C square? You probably know its theorem Pythagoras, but what is the real-life application of it.
You get my point, right? So, when I was learning data science, I actually didn’t think too much about mathematics. I focused more on the logic and the application itself. I learned how to apply the formula, apply the algorithms, rather than just the actual math itself. Only when I was doing my Ph.D., when I looked at the Maths and try to understand the fundamentals because that is the only time when you need to understand how things work. Otherwise, you can’t make any changes or improvements based on that.
Doing Data Science With Small Data Sets
Reuben: A lot of people think that in order to do data science or data analytics on business organization, you need access to lots and lots of data.
Dr. Lau Cher Han: I would say you need a lot in the sense that you need wide coverage of data. Now let’s take E-commerce as an example. Most of the small companies and startups when they begin. They always think that their company is very small, so they shouldn’t get this, they shouldn’t get and there’s always a next important thing to do. But that is not the case.
Starting small gives you a very focused data and usually, that data is very high quality. Whereas when your data gets bigger and bigger it’s like garbage in and garbage out. If your data is more high quality, you get good results and you can test out models easily. Whereas if your data gets too big then you have to start thinking about strategy, to partition the data, to split them, to organize them and whatever, right? In order to be able to process it. Remember, the three key things in data science project is always to start small, think big and you want to be able to scale fast.
Can Smaller Organizations Do Data Science?
Reuben: That leads me to my next misconception. A lot of people think that data science is a very high-level thing, and it’s only meant for large organizations and small organizations can’t use data science they don’t have it, they don’t use it. So is that true?
Dr. Lau Cher Han: No, that’s definitely not the case. Large organizations need data science as much as a small organization needs it. Probably let’s not say ‘small organization’ but even the small companies, a small group of people, small entities. You know it when you do Facebook ads or google analytics, it doesn’t matter how much traffic you get coming to the website, but you have to start collecting data from day one and put them in the right place.
Large organizations they have this problem of data silos. So, they have data which has been collected for many many years. They are all in different formats, different storages, and different systems. So they have to siphon out the data and clean up unusable data. Exactly. And to tell which data is useable and not, would take a couple months of effort when we are just entering the first data science project. And not just that, if you don’t do those things then you can’t really go into the process to discuss AI, machine learning. Just to extract useful information from it is going to hard. Imagine you have a pile of files and they are fully covered by dust. Then you try to go in and extract them piece by piece.
Reuben: So that’s to say if I’m a new company, I should start doing data science before it gets too huge to do anything.
Dr. Lau Cher Han: That’s actually quite a good way to say it. Like I mentioned in the previous question. A small company always neglect the importance of collecting data, because you always have the next important thing to do; you need to call this client now, you need to the fix colors of your website, you need to go for meetings and stuff and that’s not the worse part.
Remember when you set up Google Analytics and you want to use the data. When you need some data for cross-referencing or you need some market study. You need to do research and then you realize that your data is incomplete. It happens a lot. Yeah, you forgot to track the particular sales button of a particular page, for its conversion rate, right. We are talking about digital marketing here.
So, most people can relate. And so sometimes when you do it, you get lucky. Let’s say you talk about blog posts, you can still do it. But it will still take you a couple of weeks or at least months to gather those data. And sometimes if the event is over, we can’t find those data anymore. So, it’s better to spend some time setting things up before it’s too late.
Will Learning Python Make Me A Data Scientist?
Reuben: In order to become a data scientist, all I have to do is to learn Python or programming.
Dr. Lau Cher Han: Ok, most of you know when I talk about data science or when we teach – we usually tools Python or R. Especially for command line rather than the user interface type.
Now, I always tell my students that Python is just a tool. It’s not the only tool that data scientists use. For example, if you are you’re a carpenter and you are very good at using the hammer, it doesn’t make you a great carpenter. It just makes you a good carpenter and maybe only in certain things. You are very good at using a frying pan doesn’t make you a great chef.
Whenever people start to learn something, well, data science is such a new field. I would say if you’re just starting to enter data science, don’t think so much. Yeah don’t think so much and the easiest way is always to ask people like myself or other peers who are already in the field of data science. Everybody has their own version of something so listen and be open-minded. That’s the fastest way forward.
What other misconceptions do you have when it comes to data science, big data and also data analytics? Let us know in the comments section below and don’t forget to Like us on Facebook and subscribe to our YouTube channel