Two people working at their computer at their work station.

SQL Practice Problems to Train Your Teams

Prepare your team for advanced SQL tasks

In today's data-driven world, the ability to harness the power of data is an invaluable skill. And SQL is the language of databases. It's the key that unlocks the treasure troves of information stored in data repositories, allowing us to retrieve, manipulate, and analyze data with precision and efficiency.

To help your team develop a more thorough understanding of SQL, we have compiled a series of five SQL practice problems. These problems cover a range of advanced scenarios, allowing your team to develop SQL knowledge that will be useful in practical contexts.

Problem #1: SQL query processing with grouping, aggregation, filtering, and sorting

Consider a simplified database system with two tables: Employees and Departments. The Employees table contains information about employees, including their EmployeeID, EmployeeName, Salary, and DepartmentID. The Departments table contains information about departments, including DepartmentID and DepartmentName.

Write a step-by-step description of the logical query processing of the following SQL query:

SQL query that depicts logical query processing.

Explain each step of how this query is processed logically, including any temporary results or intermediate stages.

Solution

The SQL query provided retrieves the average salary (AvgSalary) of employees in each department, but only for departments where the average salary is greater than $50,000. The results are sorted in descending order of the average salary.

Here's a step-by-step explanation of the logical query processing:

  1. FROM Clause:
    • The first step is to identify the source tables for the query. In this case, we have the Departments table aliased as D and the Employees table aliased as E.
    • The JOIN operation is used to combine the Departments and Employees tables based on the DepartmentID column, linking employees to their respective departments.
  2. WHERE Clause (if applicable): 
    • There is no WHERE clause in this query, so we move on to the next step.
  3. GROUP BY Clause:
    • The GROUP BY clause is used to group the result set by the DepartmentName.
    • This means that all rows with the same DepartmentName will be treated as a single group.
  4. SELECT Clause:
    • In the SELECT clause, we calculate the average salary for each department using the AVG(Salary) function. The result is aliased as AvgSalary.
    • We also select the DepartmentName.
  5. HAVING Clause:
    • The HAVING clause is applied to the grouped results. It filters out groups (departments) where the average salary is not greater than $50,000.
    • Departments that do not meet this condition are excluded from the result.
  6. ORDER BY Clause: 
    • The final step is to sort the remaining rows by the AvgSalary in descending order. This means that departments with the highest average salary will appear at the top of the result set.
  7. Result:
    • The final result of the query is a list of department names (DepartmentName) and their corresponding average salaries (AvgSalary) where the average salary is greater than $50,000, sorted in descending order by average salary.

This query processing logic helps you understand how SQL queries are executed step by step to produce the desired result.

 

Problem #2: Calculating running total of sales using window functions

You have a database with a table named Sales that contains information about sales transactions. The Sales table has the following columns: SaleID, ProductID, SaleDate, and SaleAmount.

Write an SQL query that retrieves the following information for each sale:

SQL query that depicts information retrieval.

The running total of sales (RunningTotal) for each product, sorted by SaleDate.

Your query should return all the sales and include the running total of sales for each product at the time of that sale, ordered by SaleDate.

Data set

Sales Table:

SaleID ProductID SaleDate SaleAmount

101

1

2023-01-15

200.00

102

2

2023-01-16

375.00

103

3

2023-01-18

144.00

104

4

2023-01-20

600.00

105

5

2023-01-22

264.00

106

6

2023-01-23

504.00

107

7

2023-01-25

216.00

108

8

2023-01-27

462.00

109

9

2023-01-29

297.00

110

10

2023-01-30

266.00

Solution

You can use a window function, specifically the SUM window function, to calculate the running total of sales for each product. Here's the SQL query to solve this problem:

SQL query that depicts the calculation of running total of sales for each product.

This query:

  1. Retrieves the SaleID, SaleDate, SaleAmount, and ProductID columns from the Sales table.

  2. Uses the SUM window function to calculate the running total of SaleAmount for each product. The PARTITION BY ProductID clause ensures that the running total is calculated separately for each product, and the ORDER BY SaleDate clause specifies the order in which the running total is calculated (in ascending order of SaleDate).

  3. Orders the results by SaleDate to get the sales and their running totals in chronological order.

This query will give you a result set with each sale, its date, amount, product, and the running total of sales for that product at the time of the sale.

SaleID SaleDate SaleAmount ProductID RunningTotal
101 2023-01-15 200 1 200
102 2023-01-16 375 2 575
103 2023-01-18 144 3 719
104 2023-01-20 600 4 1319
105 2023-01-22 264 5 1583
106 2023-01-23 504 6 2087
107 2023-01-25 216 7 2303
108 2023-01-27 462 8 2765
109 2023-01-29 297 9 3062
110 2023-01-30 266 10 3328

Recommended course: Advanced SQL – Window Functions with Ami Levin

Problem #3: Query tuning and optimization for customer order count

You have a database with two tables: Customers and Orders. The Customers table contains information about customers, including their CustomerID, CustomerName, and Country. The Orders table contains information about orders, including OrderID, CustomerID (which links to the CustomerID in the Customers table), and OrderDate.

You are given the following SQL query, which is intended to retrieve the list of customer names along with the total count of orders they have made:

SQL query that depicts the retrieval of customer names and total count of orders.

However, you have noticed that this query is performing poorly, particularly when there are a large number of customers and orders in the database. Your task is to optimize this query to improve its performance.

Data set

Customers Table:

SaleID CustomerName Country
1 John Smith USA
2 Marry Johnson Canada
3 David Wilson UK
4 Sarah Brown Australia
5 James Lee Germany
6 Linda Davis France
7 Richard White Spain
8 Susan Miller Italy
9 Michael Taylor Brazil
10 Karen Anderson Mexico

Orders Table:

OrderID CustomerID OrderDate
1001 1 2023-10-01
1002 1 2023-10-02
1003 1 2023-10-03
1004 2 2023-10-04
1005 2 2023-10-05
1006 3 2023-10-06
1007 3 2023-10-07
1008 3 2023-10-08
1009 4 2023-10-09
1010
4 2023-10-10
1011 4 2023-10-11
1012 5 2023-10-12
1013 5 2023-10-13
1014 5 2023-10-14
1015 5 2023-10-15
1016 6 2023-10-16
1017 6 2023-10-17
1018 7 2023-10-18

Solution

Here are some optimization steps you can take:

Indexing: Ensure that there are appropriate indexes on the join columns (CustomerID in both tables) to speed up the join operation.

Selecting Only Necessary Columns: In the SELECT clause, you are only interested in the CustomerName and the count of orders. Therefore, you can rewrite the query to avoid selecting unnecessary columns. This can reduce the amount of data that needs to be processed.

SQL query that depicts the SELECT clause.

Aggregating with INNER JOIN: If the database allows it, you can use an INNER JOIN instead of a LEFT JOIN when you're only interested in customers who have placed orders. This can potentially optimize the query.

SQL query that depicts the INNER JOIN command.

Regular Database Maintenance: Perform regular database maintenance tasks like index reorganization and statistics updates to ensure the database is running efficiently.

Query Profiling: Use query profiling tools to analyze the execution plan of the query and identify performance bottlenecks. This can help you see where the query might be consuming the most resources and where you can make further improvements.

Optimizing this query involves database-specific considerations, but these general steps can help you get started. The goal is to reduce the amount of data processed and to make efficient use of indexes and joins.

As with any optimization scenario, effective query optimization may involve trade-offs depending on your specific use case and database system.

 

CustomerName TotalOrders
John Smith 3
Mary Johnson 2
David Wilson 3
Sarah Brown 3
James Lee 4
Linda Davis 2
Richard White 5
Susan Miller 3
Michael Taylor 3
Karen Anderson 3

Problem #4: Retrieving top-selling books by sales quantity

You are developing a web-based e-commerce application for a bookstore. You need to write an SQL query to retrieve a list of the top-selling books from the database.

Each book's sales data is stored in the Books and Sales tables. The Books table contains information about the books, including BookID, Title, Author, and Price. The Sales table contains information about sales, including SaleID, BookID, Quantity, and SaleDate.

Write an SQL query to retrieve the top 10 best-selling books by the total number of copies sold, along with their titles, authors, and total sales quantity. Ensure that the list is sorted in descending order of sales quantity.

Data set

Books Table:

BookID Title Author Price
1 "The Adventure" "John Smith" 20.00
2 "Mystery Unveiled" "Emily Davis" 25.00
3 "Secrets of Nature" "Michael Clark" 18.00
4 "Tech Titans" "Sophia Brown" 30.00
5 "Love in Paris" "David Wilson" 22.00
6 "The Lost Key" "Lucy Turner" 28.00
7 "Time Travelers" "Andrew Green" 24.00
8 "Beyond the Stars" "Olivia White" 21.00
9 "The Enigma" "Henry Adams" 27.00
10 "Hidden Treasures" "Linda Hall" 19.00
11 "The Puzzle Box" "George Parker" 23.00
12 "Tales of Wonder" "Emma Lewis" 26.00
13 "Infinite Realms" "Benjamin Scott" 30.00
14 "The Sapphire Crown" "Alice King" 22.00
15 "The Quantum Paradox" "William Reed" 25.00
16 "Dreams of Tomorrow" "Sophie Turner" 20.00
17 "The Secret Garden" "Daniel White" 18.00
18 "City of Miracles" "Sarah Brown" 29.00
19 "The Celestial Code" "Michael Harris" 27.00
20 "Echoes of Eternity" "Ella Parker" 24.00

Solution

You can use the following SQL query to solve this problem:

SQL query that depicts the retrieval of top-selling books by sales quantity.

This query:

  1. Joins the Books and Sales tables on the BookID column to associate each sale with the corresponding book.

  2. Groups the results by book title and author using the GROUP BY clause.

  3. Calculates the total sales quantity for each book using the SUM function.

  4. Orders the results in descending order of total sales quantity using the ORDER BY clause.

  5. Limits the result set to the top 10 records using the LIMIT clause.

The query will provide you with a list of the top-selling books in descending order of sales quantity, along with their titles, authors, and total sales quantity.

SaleID BookID Quantity SaleDate
116 16
25 2023-02-09
113 13 23 2023-02-04
108 8 22 2023-01-27
118 18 21 2023-02-13
104 4 20 2023-01-20
112 12 19 2023-02-02
106 6 18 2023-01-23
115 15 16 2023-02-07
102 2 15 2023-01-16
110 10 14 2023-01-30

Recommended course: Advanced SQL for Application Development with Dan Sullivan

Problem #5: Analyzing customer transactions for e-commerce insights

You are a data scientist working with a dataset of customer transactions for an e-commerce platform. The dataset contains two tables: Customers and Transactions. The Customers table contains information about customers, including CustomerID, CustomerName, and Country. The Transactions table contains information about individual transactions, including TransactionID, CustomerID (which links to the CustomerID in the Customers table), TransactionDate, and Amount.

Your task is to write an SQL query to analyze the data and find insights regarding customer behavior. Specifically, you need to find the following information:

  1. The total number of transactions in the dataset.

  2. The total revenue generated from all transactions.

  3. The average transaction amount.

  4. The country with the highest total revenue from transactions.

  5. The customer who has made the highest number of transactions.

Write an SQL query that retrieves this information from the Customers and Transactions tables.

Data set

Customers Table:

CustomerID CustomerName Country
101 Alice USA
102 Bob Canada
103 Carol USA
104 Dave UK
105 Eve Canada

Transactions Table:

OrderID CustomerID OrderDate Amount
101 1 2023-01-15 50.00
102 2 2023-01-16 30.00
103 3 2023-01-18 75.00
104 1 2023-01-20 45.00
105 4 2023-01-22 60.00
106 5 2023-01-23 40.00
107 1 2023-01-25 55.00
108 2 2023-01-27 35.00
109 3 2023-01-29 80.00
110
1 2023-01-30 70.00

Solution

You can use the following SQL query to obtain the desired insights:

 

SQL query that depicts the total number of transactions.
SQL query that depicts the total revenue from all transactions.
SQL query that depicts the average transaction amount.
SQL query that depicts the country with the highest total revenue.
SQL query that depicts the customer with the highest number of transactions.

This query:

  1. Counts the total number of transactions in the dataset.

  2. Calculates the total revenue from all transactions.

  3. Computes the average transaction amount.

  4. Identifies the country with the highest total revenue from transactions.

  5. Determines the customer who has made the highest number of transactions.

These insights can help data scientists better understand customer behavior and the financial performance of the e-commerce platform.

 

Metric Result
Total number of transactions 10
Total revenue generated $540.00
Average transaction amount $54.00
Country with the highest revenue USA ($375.00)
Customer with the most transactions Alice (4 transactions)

Recommended course: Advanced SQL for Data Scientists with Dan Sullivan