The Harsh Truth of Being a Startup Data Scientist

Data Science misconceptions will give you disappointments

Erick Duran
Analytics Vidhya


Photo by cottonbro from Pexels

Over the last years, demand for Data Scientists has increased significantly because of the greater digitalization footprint. Data Science became popular and was described as “The Sexiest Job of the 21st Century”. This Demand also led to an increase in the supply of Data Scientists — Universities started offering Data Science courses.

When starting a Data Science career, it is common to look for places where you can gain experience. Typically, Startups in the very beginning hire people with little experience because they can’t afford a high amount of employees and more experienced Data Scientists with higher salaries.

When someone is starting their career path, it is common to doubt your role for the company and have disappointments within your expectations. All of this disappointment is usually because of little experience and misconceptions.

The Hierarchy of Needs

Image from “The AI Hierarchy of Needs” article in Hackernoon by Monica Rogati

There is a lot of misconception about Data Science found online. The main purpose of Data Science is to use data to create as much impact as possible for an organization.

The Data Science hierarchy of needs is a pyramid that explains every role in Data Science. Aggregation and Labeling is the most important role in Data Science—this role includes doing A/B testing, experimentation, simple ML algorithms, analytics, metrics, segments, aggregates, features, training data, cleaning, anomaly detection, and preparation. In Aggregation and Labeling, Data scientists produce a greater impact on an organization and are the most important for an organization because you are trying to tell what to do.

Your role as a Data Scientist depends on the size of the organization you work for. In big companies, because of high resources, they follow the hierarchy of needs and assign a person per each role. Contrary to that, Startups may lack resources and cannot split roles — they can’t afford a high amount of Data Scientists.

Data Scientist for a Startup might have to do everything below AI and deep learning because of higher priorities. The roles below the pyramid include software engineering for collecting data and Data Engineering for structuring — these roles might be complicated for inexperienced Data Scientists with little knowledge in computer science.

Data collection is the most basic role for every organization— but for Startups, it's the biggest priority as they start from 0. Data collection refers to the systematic approach of gathering and measuring information from various sources to obtain a complete and accurate picture of an area of ​​interest. Data collection enables the business to answer relevant questions, evaluate results, and better anticipate future probabilities and trends.

Workload nightmare

The roles below AI and Deep Learning in the pyramid require a lot of effort — it is really complicated to achieve when there is one or few Data Scientists. Employees can’t specialize in a role and the quality of the work is lower — a high amount of workload can bring stress and frustration. A survey conducted by Deloitte of 1000 US professionals respondents said that 77% of them have experienced burnout in their current job. Blind also surveyed tech employees — 57.16% of 11,500 respondents answered they are suffering from job burnout.

Burnout syndrome is caused, according to Kronos studies, by poor pay (41%), an unfair amount of work (32%), and excessive overtime (32%). Even though data is not massive for Startups in the very beginning and they don’t prioritize AI and Deep Learning, the delivery of data collection, moving, and storage at the bottom of the pyramid is really crucial for business functionality.

The Data scientists in a Startup should organize well their priorities. For example, data collection is the most important at the very beginning of the Startup. For data collection, the Data Scientist will have to work with code doing software engineer jobs. When structuring data, the Data Scientist will need to get a knowledge of Data engineering. In the end, Data Analysis is the last priority of a Startup in the very beginning.

Roles in bigger companies are split into teams. Companies hire Software Engineers for data collection, Data Engineers for moving and storing data, Data Analysts in analysis for aggregation and labeling, and Research Scientists or Machine Learning Engineers for learning and optimization. Splitting roles makes the job more efficient because of specialization.

Working in a bigger company doesn’t guaranty less stress — workload can be high because of the greater amount of data and complicated challenges. The nightmare usually occurs in Startups when the Data Scientist has a misconception about their roles at the beginning of their work journey.

Most beginner Data scientists have the concept that they are going to work in a specific role of aggregation, labeling, learning, and optimizing — this is true only for bigger organizations. There are Startups with greater resources — some Startups can afford to have a higher amount of employees to split roles.

Not everything is bad about working with the different roles involving Data Science. It is a great experience where you can learn a lot. After working with every role as a Data Scientist, you can decide a path in your career.

Working for a Startup has been one of my favorite professional experiences. In the Startup, we were only two Data scientists so role splitting was really complicated. With a good organization, the job became more efficient and less stressful. There were few prediction and analysis roles and more collecting and structuring roles. My skills in coding and data engineering improved and made me a better Data Scientist. Skill improvement makes you a better professional.

Follow me for more

