In this post, I’m going to guide you how to crack SQL interview questions. I’ll use MS sql as primary sql tool to solve the questions, since the MS sql is more frequently pop out on job description from Linkedin.

Who need this post?

Some junior data related job seekers. Not knowing database too much or try to learn with sql question. Unlike the real problem, the question won’t touch more complicated issue around big data or database principles. However, keep in mind big O issue is also important. It can be common following questions.

Getting start To make sure you already pass these 2 web site

w3school
hackrank Then… Let’s rockroll on leetcode. For most of questions, the database questions on leetcode need to pay for play.

Clause

What’s minimum unit in RDBMS? It’s Clause. Most of operations is going to create/combine multiple clause.

order by
top
group by
having

Key Question Warning !!! What’s difference between WHERE and HAVING

ORDER BY

Rows cluase
MS SQL special

don’t allow wher PAIR in SELECT .

More than one table ?!

leetcode 608, 1350

We all need to answer this question first “Why DB needs multiple table?”

Having 1 giant table is called, Denormalization. Breaking that giant table into multiple tables is called, Normalization. The appeal of Database Normalization is that is prevents:

Insert Anomalies: When you need to insert some data, but all required values are not known, so it is impossible to conduct the insert.

Update Anomalies: When 1 single attribute of something changes in the real world, but you have to update multiple rows, multiple columns, or both. There is risk in these rows and/or columns getting out of sync.

Delete Anomalies: You need to delete some data, but there is data in the rows that you want to keep. You either have to forego the delete, or conduct the delete and lose the data.

Therefore, the query in most of relational database is need to twinkle multiple table. There’s primary key to connect two tables, sometimes, two table are the same one.

UNION/EXCEPT/INTERSECT/JOIN WITH statement For every coder, to make the code easier to understand is holy grail. with statement can divide the sub task with clean boarder. Last part can organize previous parts into overall result.

-- Define the CTE expression name and column list.  
WITH Sales_CTE (SalesPersonID, SalesOrderID, SalesYear)  
AS  
-- Define the CTE query.  
(  
    SELECT SalesPersonID, SalesOrderID, YEAR(OrderDate) AS SalesYear  
    FROM Sales.SalesOrderHeader  
    WHERE SalesPersonID IS NOT NULL  
)  
-- Define the outer query referencing the CTE name.  
SELECT SalesPersonID, COUNT(SalesOrderID) AS TotalSales, SalesYear  
FROM Sales_CTE  
GROUP BY SalesYear, SalesPersonID  
ORDER BY SalesPersonID, SalesYear;

Window function

As a data user, window function is inevitable part of analysis. Why it so important in interview? It can be daily function on even simple task.

Aggregation

If we assume all the data as giant single window. The aggregation is the way to extract the statistical insight on data set.

SUM/AVG/MIN/MAX They are easy to understand.
Median Hohohoho! This is not cute at all. Check Leetcode569. Take out the median, the question is super easy. Ref logica control
VAR/STDEV More stats :)

Set the window

Easy window questions : Leetcode 185

FUNCTION() OVER ([ <PARTITION BY clause> ] 
	  [ <ORDER BY clause> ] 
	  [ <ROW or RANGE clause> ] )

ROW and Range clause

UNBOUNDED PRECEDING
CURRENT ROW
UNBOUNDED FOLLOWING
?FOLLOWING
?PRECEDING

ROW_NUMBER

ROW_NUMBER ( ) 
OVER ( [ PARTITION BY value_expression , ... [ n ] ] order_by_clause )

LEAD/LAG

LEAD/LAG(columns, step) 
OVER ([ PARTITION BY value_expression , ... [ n ] ] order_by_clause )

LIMIT

SQL server doesn’t have limit. We can use alternative way to acieve the same result. Use TOP

ROW

```sql
SELECT TOP 1 * FROM table_Name 
ORDER BY unique_column DESC 
``` - OFFSET + FETCH 
```sql
OFFSET n ROWS
FETCH  NEXT m ROWS  ONLY
```

RANK & DENSERANK

RANK gives you the ranking within your ordered partition. Ties are assigned the same rank, with the next ranking(s) skipped. So, if you have 3 items at rank 2, the next rank listed would be ranked 5.

DENSE_RANK again gives you the ranking within your ordered partition, but the ranks are consecutive. No ranks are skipped if there are ranks with multiple items.

number	DENSERANK	RANK
1	1	1
1	1	1
2	2	3
3	3	4

Unique data type

Date
String
Enumerate
Date data
DATEADD(datepart, number, date)
DATEDIFF ( datepart , startdate , enddate)
DATEPART ( datepart , date )
BETWEEN can be apply to date format. Berween is inclusive both bound.

Enumerate

Leetcode 262

CAST & CONVERT

They are the same. Try use CONVERT only. Keep it simple and easy to rememver.

-- CAST Syntax:  
CAST ( expression AS data_type [ ( length ) ] )  
  
-- CONVERT Syntax:  
CONVERT ( data_type [ ( length ) ] , expression [ , style ] )

Example :

CONVERT(int, ListPrice)

String

Pick only LEFT or RIGHT

SELECT LEFT('2017-03-31',7)

The string trim down as “2017-03”

LIKE pattern Easy: Leetcode 1527 Mid: Leetcode 1517

WHERE condition LIKE '%SOMEKEYWORD%'

If you think that’s the regular express? nah… half and half. Unlike the regular expres(cheat sheet), use “+” as infinity possible .LIKE rule from MS sql

% mean the one or more characters
_ mean one characters Here’s some examples:
1. LIKE '%@leetcode' 12341dsafasdfe4@leetcode is valid for pattern. [:alpha:]+@leetcode is the same pattern for regex

Conditional control

IF ELSE statement

Leetcode 608

iff(regulation, return1,return2)

Here’s the example show, if else result can be directly use in other function.

SELECT Request_at as Day,
       ROUND(COUNT(iff(Status != 'completed', TRUE, NULL)) / COUNT(*), 2) AS 'Cancellation Rate'
FROM Trips
WHERE (Request_at BETWEEN '2013-10-01' AND '2013-10-03')
      AND Client_id NOT IN (SELECT Users_Id FROM Users WHERE Banned = 'Yes')
GROUP BY Request_at;

WHEN case end

If the if-else case more than 2 split point, when/case is easier

WHEN statement
CASE case1 THEN value_1
CASE .... THEN value_....
ELSE value_else
END

PIVOT

When the question ask about column and rows transfer, pivot is very useful to deal with. pivot: rows-> Columns unpivot -> columns ->rows

Leetcode 1179

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| revenue       | int     |
| month         | varchar |
+---------------+---------+

+------+-------------+-------------+-------------+-----+-------------+
| id   | Jan_Revenue | Feb_Revenue | Mar_Revenue | ... | Dec_Revenue |
+------+-------------+-------------+-------------+-----+-------------+
| 1    | 8000        | 7000        | 6000        | ... | null        |
| 2    | 9000        | null        | null        | ... | null        |
| 3    | null        | 10000       | null        | ... | null        |
+------+-------------+-------------+-------------+-----+-------------+

every Row has months label, we expect it as new columns. It’s the typical question we can easier to get from PIVOT

SELECT , [first pivoted column] AS , [second pivoted column] AS , ... [last pivoted column] AS FROM (

; ## Important! [] : Represent the original column cotains the future column headers Example : ```sql SELECT id, Jan AS Jan_Revenue, Feb AS Feb_Revenue, Mar AS Mar_Revenue, Apr AS Apr_Revenue, May AS May_Revenue, Jun AS Jun_Revenue, Jul AS Jul_Revenue, Aug AS Aug_Revenue, Sep AS Sep_Revenue, Oct AS Oct_Revenue, Nov AS Nov_Revenue, Dec AS Dec_Revenue FROM Department PIVOT ( MAX(revenue) FOR month IN (Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec) ) AS MonthsRevenue ``` # Function Last part in this short introduction. - Variable @/ DECLARE ```sql CREATE FUNCTION dbo.ufnGetInventoryStock(@ProductID int) RETURNS int AS BEGIN DECLARE @ret int; SELECT @ret = SUM(p.Quantity) FROM Production.ProductInventory p WHERE p.ProductID = @ProductID AND p.LocationID = '6'; IF (@ret IS NULL) SET @ret = 0; RETURN @ret; END; ``` ## Variable declaration DECLARE @variable_name datatype

Nail down the SQL interview question

Tight arsenal to crack the questions