How to get the most out of your data science initiatives?

Have solid data infrastructure in place

  • Timely: Old data is of limited use to business decisions. An important question to determine is “how old is still okay.” It is much more expensive to collect data with 1s of latency, than with one hour of latency. Many organizations will still extract most benefit from data that is even one day or one week stale. It is important however to be deliberate with respect to where the line between current and historical data lies.
  • Relevant. This may sound redundant, but data needs to answer questions that are of interest to the organization. Figuring out what these questions are is more “data art” than “data science.”
  • Systematic. Ideally, we want data that is “complete” and captures an entire population of interest, whether consumers, employees, products, etc. Sometimes this is not entirely possible, for reasons having to do with the cost of data collection (or with compliance mandates, in the case of certain data categories such as Personally Identifiable Information or Personal Health Information). In such cases statistical samples are very powerful still, but a systematic sampling frame is required to satisfy the assumptions upon which sampling theory is based.
  • Consistent. We ideally want uniform definitions of what each unit of analysis means. To take one example, a user account is meant to represent one person and a person is usually meant to have a single user account on any online platform. This assumption is often violated however — single user accounts used by an entire family or small business exist, as do situations where one person creates multiple accounts. Enforcing consistency over how one defines a user is quite a tricky task for big Internet platforms that live and die by their number of users! The problem of “it’s complicated” is ubiquitous wherever humans collect data!
  • Discoverable. Information in organizations often tends to be siloed, with particular datasets belonging to certain departments. The situation is sometimes even more complicated with datasets in different formats or in multiple legacy data warehouses which can be difficult to interrogate. Discoverability does not mean simply building a search engine (although that helps), but also making people aware of data assets’ very existence. This task can be surprisingly hard to do in the cacophony of internal communication tools that characterizes the early 21st century corporate workplace. Good information management practices ultimately mean faster data science. The more time your data science teams spend on finding data, accessing it and cleaning it, the less time they spend on other valuable tasks such as providing better insights for your business decisions.

Work on creating a culture focused on data

Integrate the data science team in your company

Skill development and autonomy

Moving forward

Aorist

--

--

--

We are a data studio in Wellington, New Zealand. We are passionate about everything data, Alluxio to Zenodo.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Mattress the excellent means to kickback https://t.co/lVP8toX8Nl

Understanding the implications of Open Citations — how far along are we?

Fantastic series of time and how to mine them: Part II

How to Set Up a Pairs Trading Backtest

QQQivratioSPY

3 Underrated Data Jobs with High Prospects

Predict Customer Churn Using Python & Machine Learning

An Analysis of Firearms Background Checks in US

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
scie.nz

scie.nz

We are a data studio in Wellington, New Zealand. We are passionate about everything data, Alluxio to Zenodo.

More from Medium

A Guide to The Most Common FAQs While Considering a Career in Data Analytics

Can data make you drive happily?

Data Literacy is Key to Data Science

people doing data science

The most important skill that every data scientist should develop