Here is an interesting business data analysis problem that I have come across multiple times in the recent times. Since I have seen this problem surfacing multiple times and the solution boils down to the same technique, it becomes an SQL Design Pattern and I named this baby as “Complement Negation”. Have shared the justification for the name at the end of this write-up.
Now, let me explain the scenario with an example using some fictitious data.
We have a table named “Projects” with the status details of various projects. Each project is given a unique project id. A Project may have multiple milestones where each milestone is identified by milestone id. The table also has the details of status of project and the status of each of the milestones. Status project and milestone can be either “closed” or “in-progress” (i.e., they are complementary).
Here we create the “Projects” table using below SQL:
—Step:1. Creating the table
CREATE TABLE Projects
(ProjectID VARCHAR(5)
, MilestoneID VARCHAR(5)
, ProjectStatus VARCHAR(15)
, MilestoneStatus VARCHAR(15)
PRIMARY KEY (ProjectID, MilestoneID)
);
Now let’s populate the table with some fictitious data using the below SQL:
INSERT INTO Projects (ProjectID, MilestoneID, ProjectStatus, MilestoneStatus)
VALUES(‘P201’, ‘MS01’, ‘closed’, ‘closed’);
INSERT INTO Projects (ProjectID, MilestoneID, ProjectStatus, MilestoneStatus)
VALUES(‘P201’, ‘MS02’, ‘closed’, ‘closed’);
INSERT INTO Projects (ProjectID, MilestoneID, ProjectStatus, MilestoneStatus)
VALUES(‘P202’, ‘MS01’, ‘in-progress’, ‘closed’);
INSERT INTO Projects (ProjectID, MilestoneID, ProjectStatus, MilestoneStatus)
VALUES(‘P202’, ‘MS03’, ‘in-progress’, ‘in-progress’);
INSERT INTO Projects (ProjectID, MilestoneID, ProjectStatus, MilestoneStatus)
VALUES(‘P202’, ‘MS02’, ‘in-progress’, ‘closed’);
INSERT INTO Projects (ProjectID, MilestoneID, ProjectStatus, MilestoneStatus)
VALUES(‘P203’, ‘MS01’, ‘in-progress’, ‘in-progress’);
INSERT INTO Projects (ProjectID, MilestoneID, ProjectStatus, MilestoneStatus)
VALUES(‘P204’, ‘MS01’, ‘closed’, ‘closed’);
INSERT INTO Projects (ProjectID, MilestoneID, ProjectStatus, MilestoneStatus)
VALUES(‘P204’, ‘MS02’, ‘closed’, ‘in-progress’);
INSERT INTO Projects (ProjectID, MilestoneID, ProjectStatus, MilestoneStatus)
VALUES(‘P204’, ‘MS03’, ‘closed’, ‘closed’);
INSERT INTO Projects (ProjectID, MilestoneID, ProjectStatus, MilestoneStatus)
VALUES(‘P204’, ‘MS04’, ‘closed’, ‘closed’);
INSERT INTO Projects (ProjectID, MilestoneID, ProjectStatus, MilestoneStatus)
VALUES(‘P204’, ‘MS06’, ‘closed’, ‘closed’);
INSERT INTO Projects (ProjectID, MilestoneID, ProjectStatus, MilestoneStatus)
VALUES(‘P204’, ‘MS07’, ‘closed’, ‘in-progress’);
INSERT INTO Projects (ProjectID, MilestoneID, ProjectStatus, MilestoneStatus)
VALUES(‘P207’, ‘MS04’, ‘closed’, ‘closed’);
INSERT INTO Projects (ProjectID, MilestoneID, ProjectStatus, MilestoneStatus)
VALUES(‘P207’, ‘MS06’, ‘closed’, ‘in-progress’);
So, below is how the table looks:
SELECT * FROM Projects;
Business rule is that a project is considered to be closed only when all the milestones under the project are closed.
However, the table has some records that does not satisfy this rule. e.g.: Project – P204 has two milestones (viz., MS02, MS07) whose status is “in-progress” but the project status itself is “closed”. Similarly, Project – P207 has one milestone (viz., MS06) whose status is “in-progress” but the project status itself is “closed”. These cases that doesn’t satisfy the business rule are considered abnormal cases/Projects.
Challenge at hand: Now the challenge at hand is to query the details of all abnormal projects (i.e., details or projects where project status is “closed” but the status of one or more milestones under the project is “in-progress”) and report the complete data.
Solution: Below is the SQL used to query/retrieve all the abnormal packages.
WITH temp AS (
SELECT ProjectID, MilestoneID, ProjectStatus, MilestoneStatus
, SUM(CASE WHEN MilestoneStatus = ‘in-progress’
THEN 1
ELSE 0
END) OVER(PARTITION BY ProjectID) AS Count_Indicator
FROM Projects
)
SELECT ProjectID, MilestoneID, ProjectStatus, MilestoneStatus
FROM temp
WHERE Count_Indicator > 0
AND ProjectStatus = ‘closed’;
Results: The results appear as below:
Explanation: First, in the CTE, we retrieve all the details of projects that need to be reported. Then we derive a column that counts only the number of “in-progress” milestones under each project and name this derived column as, say Count_Indicator.
Now we retrieve the records where Project Status is “closed” but with the count of “in-progress” milestones (indicated by Count_Indicator derived column) being greater than 0 (i.e., there exists at least one milestone that is “in-progress”). This is nothing but retrieving the records where project status appears “closed” but with one or more underlying milestones still being “in-progress”.
Name Justification: The word Complement – since the status’ “closed” and “in-progress” are complementary (i.e., # of closed projects or milestones within each project plus # of in-progress projects or milestones within each project equals the total # of projects or milestones within each project respectively). The word Negation – since we test for status of one field for “closed” and the status of the other field for “in-progress” (i.e., “closed” test is negation of “in-progress” or vice versa).
Hence I feel the name for this SQL Design Pattern is justified to be “Complement – Negation”.
Hope you enjoyed reading this post. Do share your comments, suggestions, questions or clarifications, if any. Have great time until my next post..!!
WordPress Tags: CTE,common table expression,Microsoft,sql server,t-sql,SQL,Complement,Negation,Design,Pattern,Here,data,analysis,times,solution,technique,justification,scenario,example,status,Project,Step,CREATE,TABLE,ProjectID,VARCHAR,MilestoneID,ProjectStatus,MilestoneStatus,PRIMARY,INSERT,INTO,VALUES,SELECT,FROM,records,cases,Challenge,packages,CASE,ELSE,OVER,PARTITION,Count_Indicator,WHERE,Results,Explanation,column,Name,word,Complementary,equals,itself
Leave a Reply