Analytics – It’s all about gaining holistic understanding (a 360 degree view of historical, current, and forecasted aspects) of the subject under study. Data is the foundation for all analytics. When studying a subject, you are basically studying it by means of studying the various attributes of the subject. The type associated with each of these attributes (called data type) drives the complete approach to analytics. Hence, it is quintessential to understanding the general types of data in order to further perform any analytics. This is the reason why most analytics literature starts with the “types of data”.
Here I attempt to present an infographic (and detailed examination below) with a holistic view on the types of data in self-service BI, reporting and analytics (aka Citizen Data Science) while also embracing the new data types resulted with the advent of big data technologies.
Categorical Vs Numeric attributes: Data is the result of making observations on attributes of a subject of interest. Technically attributes are also referred as variables. At a very high level, attributes of a subject under study can be either “categorical” or “numeric”. Values (or observations) of a categorical attribute consists of names or labels. e.g. numbers on the sports person’s jerseys, whereas values of a numeric attribute consists of numbers representing counts or measurements. e.g. Age (in years) of respondents of a survey.
Qualitative Vs Quantitative Vs Pseudo-Quantitative Data: Data collected about these categorical or numeric attributes can be qualitative or quantitative or pseudo-quantitative. Data collected about numeric attribute are quantitative whereas the data collected about categorical attributes are qualitative. However, note from the infographic that data about categorical attributes (fundamentally qualitative in nature) can be translated into quantitative and vice versa using the techniques called “Counting” and “Binning” respectively.
Discrete Vs Continuous Quantitative Data: Quantitative data can be further classified as Discrete or Continuous.
Discrete Quantitative Data: Quantitative attribute results in discrete data when the number of possible values of the attribute is either a finite number or a countable number (i.e., they correspond to isolated points on the number line). E.g.: The number of eggs that hens lay are discrete data because they represent counts. Discrete data usually arise when observations are determined by counting.
Continuous Quantitative Data: A Quantitative attribute result in continuous data if the possible values of the attribute form an interval on the number line. E.g.: The amounts of milk from cows are continuous data because they can represent any value over a continuous span. Continuous data usually arise when observations are determined by measuring, as opposed to counting.
Nominal Vs Ordinal Qualitative Data: based on the measurement scales, Qualitative or Categorical data can be further classified as nominal or ordinal.
- Nominal Scale: The qualitative attribute is said to be defined on a nominal scale, if the observations made on such attribute (i.e. data) are unordered. E.g.: Survey responses – “yes”, “no” or “undecided”.
- Ordinal Scale: The qualitative attribute is said to be defined on ordinal scale, if the observations made on such attribute results in data that can be ordered but differences between the observations either cannot be determined or are meaningless. E.g.: the course grades assigned by a professor can be ordered but the difference between the grades cannot be determined. For instance, we know that grade A is higher than grade B (so there is an ordering), but we cannot subtract grade B from grade A (so the difference cannot be found).
Ordinal data enables relative comparisons, but not the magnitudes of the differences. Usually ordinal data should not be used for calculations such as average, but this guideline is sometimes sensibly violated, such as when we use letter grades to calculate grade-point average).
Interval Vs Ratio scale Quantitative Data: based on the measurement scales, Quantitative or Numeric data can be further classified as below:
- Interval Scale: The quantitative variable is said to be defined on Interval scale, if the differences between measurements of such quantitative variable can be compared meaningfully, but not the ratio of the measurements. This scale of measurement does not have a natural zero starting point. E.g.: Body temperatures of 98.2F and 98.6F. These values are ordered, and we can determine their difference of 0.4F. However, there is no natural zero starting point. The value of 0F might seem like a starting point, but it is arbitrary and does not represent the total absence of heat. Another example, the years 1947 and 2020 (Time did not begin in the year 0, so the year 0 is arbitrary instead of being a natural zero starting point representing “no time”.
- Ratio Scale: The quantitative variable is said to be defined on ratio scale, if both the differences between measurements and the ratio of the measurements of such quantitative variable can be compared meaningfully.
Following are the examples of data at the ratio level of measurement. Note the presence of the natural zero value, and also note the use of meaningful ratios of “twice” and “three times”. Distances (in km) traveled by cars (0 km represents no distance traveled, and 400 km is twice as far as 200 km). Prices of college textbooks (0$ does represent no cost, and a $100 book does cost twice as much as a $50 book).
Temporal Data: One more conventional but a very prominent and specialized type of data in many analytics initiatives is “Temporal Data”. It is the data that represents a state in time and that varies over time (i.e., data that specifically refers to times or dates). Temporal data is characterized by data elements being a function of time.
Cross Sectional Vs Longitudinal Temporal Data:
- Cross Sectional (aka Snapshot): Data collected at a point-in-time are called Cross Sectional Data (aka snapshot data)
- Longitudinal (aka Time Series): Data collected over a period of time, such as months, quarters or years, are called Time Series Data aka longitudinal data.
Geo-Spatial Data: In addition to the above conventional data types, another data type though present since quite sometime but has picked up prominence with the advent of big data technologies is Geo-Spatial (or simply Spatial) data. Data about a physical object that can be represented by numerical values in a geographic coordinate system are called Geo-Spatial Data.
Binary Data: Finally, not to miss, you have binary data that is collections of digital bits (zeros and 1s) about images, videos, etc.,
Temporal, Geo-Spatial, and Binary data are classified as pseudo-quantitative since there is good amount of specialized arithmetic and analytics that can be performed on these types of data but they are not as straight forward as that of conventional Quantitative type of data.
With this fundamental learning, you can even download this infographic for your reference either clicking on it or this one and a lot more resources from the free resources section or from the “pinterest” page.<p value="<amp-fit-text layout="fixed-height" min-font-size="6" max-font-size="72" height="80">Happy Learning..!!Happy Learning..!!