Tuesday, November 30, 2010

Are you a Data Engineer or a Data Scientist?

Are you a Data Engineer or a Data Scientist?

In the recent time two new designation / title is making the headlines in the data world. 🌎
Data Engineer & Data Scientist and it all begun with Data Analytics.

A data is no good unless we derive information out of it and that information should be insightful. The set of tools which lets you do that is known as data analytics products. People working in analytics are often associated with terms like Engineers & Scientists.

Why does it matter at all? 

Well it matters because it determines the career road map for you. So, in order to know who you are or rather which of these broad category define your true nature. You will need to know the difference between them.

Data Engineers 

Decisions backed by data and analytics can provide a competitive advantage and increase ROI.
Hence it is critical that data analytics solutions implemented in your enterprise are fast and efficient.

If you are a data engineer you are expected to design, develop, implement and support product which can deal with data. Be it structural (Master, Reference, Meta, Transactional or Analytics) data or non structural data and curate a data pipeline or move data to production. Data engineers make the appropriate data accessible and available to the right users at the right time by enabling secure, compliant data utilization and democratization across enterprise.

If you have more fun with the technical part of data . Like designing, building and arranging different components of data flow architecture. This is closely related to data architect in-fact when the traditional data architects starts handling and designing Big Data instead of Data Warehouse they are known as data engineers.

Data Scientists give a meaning to your data. Basically if you can read and understand trillions of data to predict a situation or trend by using analytics then you are a data scientist. 
A Data Scientist collects, interprets and publishes data. The data they find can be used for many reasons, but in the business world, it often applies to finances or productivity. For example, a Data Scientist may look at sales figures against business decisions made over a certain time frame to determine how successful those decisions were. 

These professionals provide the forecasting knowledge a business needs to know whether changes will be effective before making a decision. Data Scientists work in a variety of industries, such as IT, healthcare, finance, retail and marketing.
  1. In order to do their job efficiently the scientists typically follow the below mention approach:
  2. Ask the right questions to begin the discovery process
  3. Acquire data
  4. Process and clean the data
  5. Integrate and store data
  6. Initial data investigation and exploratory data analysis
  7. Choose one or more potential models and algorithms
  8. Apply data science techniques, such as machine learning, statistical modeling, and artificial intelligence
  9. Measure and improve results
  10. Present final result to stakeholders
  11. Make adjustments based on feedback
  12. Repeat the process to solve a new problem

Kinshuk Dutta

Nov. 2010

I recently found this excellent pictorial representation on Data Camp. Which echoes my thought.

Scala & Spark for Managing & Analyzing Big Data (Using Machine Learning)

Managing & Analyzing Big Data using Apache Scala & Apache Spark In this blog we will see how to use Scala and Spark to analyze Big D...