Data scientists: the superheroes of data

e24f7-springishere21.png
Most of the data science practitioners here are not doing the hardcore data science. They are either cleaning data, handling business intelligence or data analysis. This could be due to the market needs now, the readiness of companies to embrace data science or unfortunately, the lack of hard skills.
— Joanna Yeoh, Director, ConnectOne

Look at a job description for a software engineer these days and you’re likely to stumble across the phrases, “big data”, “data scientist” or “machine learning.” Data is truly the celebrity of our times. As we go online more and more, businesses are seeing unprecedented volumes of data (forget the terabyte, we are talking petabytes here!) enter their systems. Businesses not only have to tame this voluminous data, but they also need to interpret the streams of raw data flowing into their system constantly. With this deluge of data pouring in, businesses are desperately looking for help from the superheroes of data: the data scientists.

For international companies who have clients in Singapore and the region, there is a compelling need to source for local data science talent. Yet, this pool of talent proves elusive for smaller companies who need the heavyweights but may not have the resources to import their talent with them. Weighing this issue down is the mismatch between responsibilities and skills; data scientists often get mixed up with other data experts such as data analysts and big data professionals.


Let’s be clear, what is data science?

“Data science usually refers to either machine learning or deep learning. With deep learning, you need a huge amount of data in order to make sense of it. In the real world, there are not that many areas for deep learning because of data limitations. Take us for instance, you can’t force more transactions to happen in the Singapore housing market so we can have more data. While deep learning is often the focus of the media, machine learning definitely has more real-world applications. ”
— Mike Cho, Founder, UrbanZoom
Mike Cho, Founder, UrbanZoom

Mike Cho, Founder, UrbanZoom

It wasn’t too long ago that the work of data scientists was the same as computer scientists. As the volume of data kept growing, the discipline morphed from computer science to business analytics and data analytics. With the dawn of Big Data, it shifted again to what we now, think of as data science. How it shall evolve depends on the state of future data.

Surprisingly, the first use of “data science” can be traced all the way back to 1996 when it was used as a conference headline. But it really was the 2012 Harvard Business Review (HBR) article that made the term mainstream. The HBR article profiled data scientist pioneer Jonathan Goldman and his work on predicting recommended connections for Linkedin Users, bringing the term, “data scientist” as we know (and love!) today into popular usage.


Can a data scientist by any other name, still be a data scientist?

After the 2012 HBR article, data science as an industry genre started trending and there was a sudden surge in “data science” courses. But as it turns out, many were business analytics courses that rebranded themselves to cash into the data science trend. This led to confusion in the market around what these different data experts did. In addition to existing analysts, there was a new breed of data experts but they seemed to be doing the exact same work as data and business analysts!

Thankfully, six years on, more clarity has emerged. It’s now clear that all three disciplines deal with statistical models and use data to drive decision-making. It’s this overlap that has created a fluidity between these roles that can be confusing. Let’s take a look at the three most commonly confused data expert roles:

  • Business analytics typically deals with information in the past to gain insight into business planning. Business intelligence, a subset of business analytics, is interested in historical business data. They both use descriptive statistics and reporting or visualizing data of past events. They work with data science if they need predictive modeling to gain insight into the data.

  • Data analytics occupies more of a grey area and can overlap into the work of a data scientist in early startups. Typically, it deals with structured and “cleaned up” data to draw conclusions and provide insights but more experienced analysts could map raw data and convert it for consumption. Data analysts are often also responsible for data visualizations for the businesses, creating reports and dashboards.

  • Data scientists can dabble in data analysis in the course of their work but their primary focus is to engage with the data from multiple data streams to detect patterns that can help him create predictive models. They also bring structure to formless data so statistical analysis is made easier and more accurate. They can choose to specialize in data cleaning, data shaping or machine learning.


Singapore: A unique challenge

First, the facts. Singapore is serious about data. According to EDB, data analytics annually contributes about S$1 billion to the Singapore economy with regional data analytics services projected to reach S$27 billion. Singapore is also playing host to Facebook’s data center, poised to open in 2022 as well as Alibaba’s first joint research institute outside China.

Second, the talent. An overview of Linkedin shows that there are about 2000 professionals in Singapore who identify themselves to be in data science. The experience of these professionals seem to be evenly distributed; individuals who identify as having more than 10 years, 6 to 10 years as well as 3 to 5 years of data science experience all hover at a similar proportion. Unsurprisingly, a large proportion of these candidates are employed by Grab.

Despite these promising numbers and the support from the government, the challenge to find high-quality talent in abundance persists.


What are the superpowers of a data scientist?

“I look for 3 key sets of attributes:

(i) a fundamental understanding of math and statistics, (ii) the ability to code beyond the lab to code in productions, and (iii) the ability to communicate within a specific domain. The last one is probably the most important. We’re not looking for a generalist. We need someone who can ask the right questions and apply data science appropriately to solve a problem within a specific domain”
— Ken So, Founder, Flowcast
Ken So, Founder, Flowcast

Ken So, Founder, Flowcast

Who better to shed some light on what the market is looking for than the market itself? We mined insights from two startup founders: Ken So, founder of Flowcast, a Silicon Valley fintech startup with clients in Singapore, and Mike Cho, founder of Urban Zoom, Singapore’s first AI property valuation tool.

Both Ken and Mike are in agreement on what they look for in a data scientist:

A well-rounded technical person who has the soft skills and capability to apply theory to real-world issues. If it sounds like everything, well, that’s what sets apart exceptional data scientists from the average!

A technically strong data scientist makes analysis easier but more importantly, creates an ongoing relationship with data so that it can be meaningfully used to support business decisions. But that’s not enough! A technically strong data scientist who has the relevant soft skills will be able to hold his own in a business setting, using the relevant communication skills to tell a story with the data, shaping stakeholder perception and guiding business decisions.

Mike says, “Aside from the usual expectations of being self-reliant and a team player, a data scientist should ideally have the full suite of skills, with exposure in data gathering, data cleaning, data shaping, and machine learning. Frankly, for our work, algorithms are about 10% of the work. 90% is about cleaning up real-world data because real-world data is dirty. Only a full suite data scientist would be able to deal with it adequately.”


Challenge of grooming talent in the region

“Not many data science candidates come in with the commercial and practical experience we tend to see in Silicon Valley.”
— A pressing difference in data scientists in SV and Asia

The challenge for grooming talent is exposure. Ken points out, “In the region, excluding China, there is an expectation that Singapore will have the concentration of talent. However, there is a noticeably heavy emphasis on academic thinking and theory that we see in candidates from around here. Not many data science candidates come in with the commercial and practical experience we tend to see in Silicon Valley.” He adds, “US companies have been investing in Big Data and its infrastructure for years. Now they are in the phase of deploying data science solutions. By contrast, Asian companies are still behind in that investment and deployment cycle.”

“Top talent is always going to be where the big data is going to be.”
— Mike emphasised the challenge of finding top talent in a small market

Mike believes one of the key issues is the inherent market size. “I have been to data science meetups in Singapore and they are undoubtedly well qualified, but the discussions in Silicon Valley are definitely more sophisticated and have more depth because of the problems they are trying to solve.” He says, “If your market is only going to be Singapore, there just won’t be enough data. Top talent is always going to be where the big data is going to be. In the size of a market like Singapore, the data is unlikely to present the interesting problems required for deep learning.”

While the facts seem sobering now, the government’s interest and investment into data will definitely boost the industry and the quality of talent available. In early 2018, NUS partnered with Grab to set up an AI lab with an initial investment of S$6million while NTU’s Data Science and Artificial Intelligence Research has attracted support from leading tech companies Nvidia and Paypal. With such big money backing the data science scene, it’s clearly a matter of time before talented data scientists start blossoming and match up to the expectations of the industry.