Which is the proper focus?

We find ourselves in the era of big data, where vast, continuous streams of heterogeneous human-related data are collected by digital means, and simplified for consumption according to the 5V characterization: Volume (size of the data), Variety (diversity of the content), Velocity (the rate it’s produced), Veracity (the quality of the content) and Value (it’s business impact). These humongous data sets are collected via many different means including computer networks, social media profiles, web browsing histories, mobile phone sensors, Internet of Things (IoT) devices, video data from (self-)driving and robotic applications, our commercial transactions and more.

The long tail of data.
  • In most cases, small data is the right data for the problem at hand [5];
  • Small data is more available, precise, and complete;
  • Small data is driving the Internet of Things [6];
  • Small data is about people, small groups, and communities;
  • Small data describes every person in each context;
  • Small data can be understood and interpreted by humans;
  • Most innovations are triggered by small data [7].
  • Resolution and Identity — How fine-grained is the data and how identifiable is each item?
  • Relational — How easy is it to conjoin different datasets through common fields or encodings that are part of the data?
  • Flexibility — How easy is it to extend the data (g., adding new fields) and scale in size?
  • Privacy — How does the data relate to people?
Table 1. Comparison of characteristics of small data and big data (partly based in [11]).

World-class expert on data science and algorithms. Research Professor at IEAI, Northeastern University. Former VP of Research at Yahoo Labs. ACM & IEEE Fellow.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store