[#8] Data Science, in General
How should companies prioritize hiring generalists over specialists, and how does that affect your job search?
There is a bit of discussion in the field regarding “generalist” versus “specialist” data scientists. A "specialist" data scientist will be an expert in a specific aspect of the data collection and storage process or a specific algorithm or set of techniques, whereas a “generalist” knows a bit about many algorithms and each step of the data science process, from collecting and processing data through to a production-ready machine learning model.
Some argue that data science teams should be filled with “generalists” (in other words, “full stack data scientists”). Eric Colson, former Vice President of Data Science and Engineering at Netflix, argues that specialization should only be preferred when efficiency is paramount, such as in a manufacturing assembly line. For a data science team, on the other hand, the “goal should be to learn, not be efficient”.
Another argument in favor of generalists is that “while specialists are excellent at reproducing work that they are well-practiced at, sometimes they struggle to navigate uncharted territory where rules are not well defined”.
In other words, thinking back to my previous post, where I discuss how varied are the paths that people take to becoming a data scientist, this quote explains how a circuitous path to data science can be beneficial, in particular, when a generalist is needed:
But while their career path might seem chaotic on paper, in truth it provides a vast array of experience that can be called upon for problem-solving — which is the bread and butter of data science and analytics.
However, even the proponents of generalists admit some potential exceptions. Colson points out that: “if you are dealing with petabytes or exabytes of data, specialization in data engineering may be warranted.” Furthermore, if the business needs of the company favor “keeping a business capability online and available” over experimentation and trying more cutting-edge solutions, specialists may be preferred.
Overall, a common thread seems to be that the balance between hiring generalists and hiring specialists depends on the size and maturity of the company. A small start-up may be best served by generalists who can quickly put very different pieces together and produce a working prototype. On the other end of the scale, a very large company can afford to hire specialists for each part of the data-driven development cycle, and may benefit from incremental improvements that, say, an expert in a particular machine learning algorithm can bring in terms of accuracy and/or efficiency.
So, what do you do if you are trying to get into the data science profession, or trying to find a new job? Should you try to specialize in a specific skill or try to stay general? Some advice on that:
If you’re starting out — don’t decide to be a generalist or a specialist — figure out what you personally are good at and then forge your own unique path. If you enjoy going very deep on one or two topics or want to be known as an expert on a subject, you’re probably going to enjoy being a specialist and should look for jobs that cater to your incredible and focused technical ability. If you have a very wide range of interests or get bored fast, then you probably are a generalist. Get technical enough that you can meet the minimum requirements for a technical interview but broad enough that you can architect solutions (for example, build a business intelligence program from scratch, or create a machine learning pipeline from end-to-end, or build a web-app that takes data from multiple sources).
Additional advice for a generalist is to “take time to get knowledge and skills that will supplement your analytics profession… Business, finance, marketing, design, and product are a few examples of great multipliers for an analytics professional.”
In other words, unless there is some very specific area of data science that really interests you (and that is currently in demand), focus on building the general technical skills that supplement your current life and work experience and will serve as a solid foundation for many generalist data science positions.
Python Corner — Append, Extend, Comprehend
As many data scientists, including myself, are using Python, I will occasionally include some tips and tricks or interesting packages to share.
If you have been using Python at least a little bit, you are probably familiar with lists, which can store pretty much anything — integers, floats, even Pandas dataframes.
Here is a tutorial on building lists, using .append()
, .extend()
, and list comprehensions. If you're not familiar with list comprehensions, they essentially put a for
loop on a single line.
They also introduce another, less well-known data structure called a deque
. The deque
has a .appendleft()
method, in addition to .append()
.