[#7] All Roads Lead to Data Science

There are many unique paths to becoming a data scientist

I don't really spend any time on Twitter, but there are a number of data scientists who are active there and, apparently, sometimes interesting discussions arise. A few weeks ago, Randy Au wrote about how a bunch of data scientists were responding in a tongue-in-cheek way to the idea that there is one true path to becoming a data scientist. In his post, he includes a bunch of these tweets, and you can see the variety of paths people took, and based on the way they sound, it seems these paths were not carefully planned out from the beginning.

In Randy Au's post, he writes the following, which I really liked:

Nothing is ever completely irrelevant so long as you’re willing to draw parallels.

I would actually change this slightly to:

Nothing is ever completely irrelevant. Period.

I know that when I started my career as an astronomer, I didn't even know what data science and machine learning were, but I had a feeling that the work I was doing, and the skills I was developing, as an academic researcher would come in handy even if I left academia and moved into a different career.

One thing that is common about the paths that these different data scientists took to get to where they are in their careers is that there was a path. None of them went straight from graduating college to working as a data scientist. The job of data scientist requires a variety of skills, some of which come from work and/or life experience.

All of these different experiences provided these data scientists with skills, perspective, a unique view of the world, that forms the basis of who they are as data scientists. Nothing about these experiences is irrelevant.

These discussions of the different roads that lead to a data science career are certainly inspiring and can help give confidence to those who feel that they don't have the "right" experience, or the "right" training. There is no one true path to data science that you absolutely must take to have a chance.

However, what advice does one give to someone who is not a data scientist and wants to be one? The people who tweeted about their experiences are responding to those who insist that they know the one true path, but you still have to break into the field in some way.

The common set of advice includes learning Python or R (if you don't use either already), take a couple online courses, and come up with a data science side project to showcase your skills and ability (see these articles, for example).

I recently posted an article on Medium, where I looked at the four years of Kaggle data science surveys to identify trends in the field. Not surprisingly, Python is the dominant programming language, and, in fact, R is decreasing in popularity.

One potential inspiration for aspiring data scientists is that the fraction of employed data scientists with a graduate-level degree, like a Master's or PhD, is slowly decreasing. Furthermore, despite Python's dominance, there is still a small, but consistent, fraction regularly using other tools, like Matlab, at work.

When looking at the algorithms that data scientists regularly use, linear and logistic regression remain dominant, with over 80% using these more "basic" algorithms. After these, the next most popular are random forests and gradient boosting machines (such as XGBoost), not neural networks. In fact, the popularity of traditional dense neural networks is declining.

On the one hand, these data confirm that there is not one fixed set of skills or set of tools required to become a data scientist.

On the other hand, it is worth keeping in mind some of the dominant trends when focusing one's data science training. Going in to the interview process, it would appear that having a solid grasp of linear and logistic regression is essential. Furthermore, instead of focusing on perhaps the trendier topics like neural networks, look at tree-based models first (although, one caveat here is that if you are specifically interested in fields like image recognition or NLP, algorithms like CNNs and RNNs have been increasing in popularity over the past few years).

In other words, use the data to guide your journey along your own unique road to data science.

Leave a comment

The Job Search

It is likely that, if you are reading this, you are either applying for jobs or maybe you will find yourself applying for a new job in the future. Occasionally, I will include at the bottom of this newsletter some job search or interview tip.

With my Medium article looking back at the past four years, how about asking ourselves where we see ourselves in the next four (or five years)?

Here is a very common interview question:

Where do you see yourself in 5 years?

I find this to be a rather annoying question, and it can be difficult to know how to best answer it. Here are a couple resources to help with this:

Enjoy the rest of your week!