Why is SQL one of the most powerful tools in the arsenal of an Analyst?

Published by


Out of information analysis, comes wisdom. It’s this information analysis that is one of the important reasons that created the need for storage of data. The world of databases was sprouted to address this need of data storage and many languages were born to interact and deal with the data stored in these databases. But one language has survived, stood the test of time, consistently evolved and thrived – SQL is that one language.

A database is a logically coherent collection of data. Most of the enterprise data today sits in relational databases where the data is organized in tables. Tables are the fundamental objects in relational databases that store data in rows arranged into columns. This means that a column in a relational database table contains similar type of data and a row is one instance of all the columns in the table. The collection of software components that allow you to create, manipulate and manage the relational databases is called Relational Database Management System (RDBMS). Examples of RDBMS include Oracle, Microsoft SQL Server, IBM DB2, MySQL, Teradata, etc.,

SQL was initially introduced as the language for managing data held in a relational database management systems.

History of SQL:

SQL was initially developed at IBM by Donald D. Chamberlin and Raymond F. Boyce in the early 1970s. This version, initially called SEQUEL (Structured English Query Language). The acronym SEQUEL was later changed to SQL because “SEQUEL” was a trademark of the UK-based Hawker Siddeley aircraft company.

Databases space has undergone significant technological advances over the last couple of decades with innumerably distinct storage architectures. JSON databases, Key-Value pair databases, Document databases, HDFS are just a few to name. These significantly distinct architectures lead to the creation of several database languages closely coupled to the underlying architectures, to interact and deal with the data stored in these databases. Amid stiff competition from these new languages, time and again SQL has survived, stood by the test of time and competition, continuously evolved since last four decades consistently ensuring relevance to the contemporary demands of data access, analysis and reporting and thrived to become the most inevitable go-to language for data analysis.

What makes SQL apparently simple but so much powerful and flexible?

SQL is Declarative: SQL is the 4th generation language that is declarative in nature. i.e., unlike procedural languages, SQL only expects you to mention what you want and never about how you want the database to work to achieve what you want. The declarative nature of SQL frees-up the user from worrying about technical complexities involved in indicating how to interact with data in the databases and lets the user focus on what is needed. It’s this declarative nature of the language that drives its simplicity.

SQL has strong mathematical foundations in set theory: Relational algebra is a consistent framework for organizing and manipulating sets of data. SQL concisely and intuitively expresses this mathematical system.

SQL has technically formidable design: In comparison to older read/write APIs like ISAM or VSAM, SQL offers two main advantages: First, it introduced the concept of accessing many records with one single command, and second, it eliminates the need to specify how to reach a record, e.g.: with or without an index.

SQL is Flexible: The pure Relational model enthusiasts may not like to appreciate the fact that SQL deviates from its underlying Relational model considerably. Just as any other database language, SQL is also closely coupled with relational database architectures, but it’s not the perfect implementation of the relational model. SQL deviates from its theoretical foundations (i.e., the relational model and its tuple calculus) in considerably good number of ways. Simplest example is a Relation in strict sense is not expected to have duplicate records but a table (equivalent of Relation) in SQL can absolutely have duplicate records.

SQL is adopted at scale: SQL is so much powerful yet flexible that the vendors of databases with non-relational storage architectures have also started implementing an abstracted layer of SQL on top of their respective non-relational database architectures under the umbrella named “SQL on Hadoop ecosystem”, without much compromising on the underlying database performance. Below are a few such SQL implementations:

  • Apache HiveQL – an implantation of SQL layer on HDFS (Hadoop Distributed File System)
  • Cludera’s Impala – Cloudera Impala is Cloudera’s open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop.
  • Facebook’s Presto – an open source distributed SQL query engine.
  • Oracle’s Big Data SQL
  • IBM’s BigSQL
  • Apache HAWQ
  • Google’s BigQuery
  • MapR’s Apache Drill SQL – is an open-source ANSI SQL – 2003 compliant SQL framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets.
  • Microsoft Azure Data Explorer SQL

In the recent times, SQL has also been accepted as the de facto data processing language for NoSQL category of databases like Hadoop distributed file system in the name of HiveQL.

SQL is independent of the storage model: While some might think of SQL as synonymous with relational databases, many of the new adopters of SQL listed above are built on non-relational data. SQL is well on its way of becoming champion of diversity and inclusion in the arena of Data Analysis.

SQL is relevant even in the modern era of self-service BI and Analytics platforms: When an analyst doesn’t really have enough time to model the data to fit the self-service BI platforms, the best choice at hand would be to use SQL to quickly query the data to answer any business questions. Also when the need is to perform calculations on huge amount of data, good practice is to push most calculations to the underlying sources. And most of the today’s structured enterprise data sits in Relational DBMSs’ where the standard language to deal with data is SQL.

SQL is a Standard: SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standardization (ISO) in 1987 and has been continuously revised since then. Until 1996, the National Institute of Standards and Technology (NIST) data management standards program certified SQL DBMS compliance with the SQL standard. Vendors now self-certify the compliance of their products.

SQL has support for diverse data types including but not limited to the below:

  • XML
  • JSON
  • Spatial data
  • Temporal data
  • Graph data

SQL continuously evolved: SQL has consistently ensured relevance to the contemporary demands of data analysis and continuously evolved to today’s state with a compelling and rich array of analysis features and functions thus becoming powerful.

  • Standard Aggregations
  • Window Aggregations
  • OLAP Operations
  • Reporting Functions
  • Ranking Functions
  • Row Offset Functions
  • Pivot and Unpivot Operations
  • Ordered Set Functions
  • Linear Recursion
  • Row Pattern Recognition
  • Polymorphic Table Valued Functions

and a lot more…

SQL has a lot more in store for the times ahead:

  • Standardized Support for Multi-Dimensional Analysis
  • Standardized Support for Graph oriented data

SQL is naturally closest to Data Analysis: More importantly, in addition to all of the above, given some data, if you are asked to analyze it, most likely you will end up organizing the data in tabular form for further analysis. Tables are the closest form to the human intuition of organizing and analyzing data and SQL is designed to be the closest language to the concept of tables thus making it naturally closest to Data Analysis.

SQL has relatively much shallow learning curve: Getting started with learning SQL is very easy. There are legions of analysts who already know SQL.

The top two open secret industrial strengths of SQL are – Declarative nature and Table data structure oriented.

 Its adoption at scale in the industry clearly shows that SQL has evolved from being just another database query language to a flexible and comprehensive framework for data analysis.

This brings us to the conclusion and justifies that SQL is truly the natural language for data analysis and one of the most powerful tools in the arsenal of an analyst. So, it’s time to start sharpening your SQL analytics skills..!!

Glance through the free resources page to take more learnings on SQL. Engage me for SQL Analytics Trainings in your Organizations.

One response to “Why is SQL one of the most powerful tools in the arsenal of an Analyst?”

  1. Vishnu Avatar

    Good content!!!

Leave a Reply