When should you use JOINs and CTEs instead of subqueries in SQL –
Table Definitions and Data
1. Customers
Table
This table stores information about our customers.
Customers
Data:
SQL
INSERT INTO Customers (CustomerID, CustomerName, CustomerAddress) VALUES
(1, 'Alice Wonderland', '123 Rabbit Hole, Fantasyland'),
(2, 'Bob The Builder', '456 Construction Site, Builderville'),
(3, 'Charlie Chaplin', '789 Silent Film St, Hollywood'),
(4, 'Diana Prince', '101 Themyscira Blvd, Paradise Island'),
(5, 'Eve Harrington', '202 Stage Door, Broadway');
2. Products
Table
This table stores information about the products we sell.
Products
Data:
SQL
INSERT INTO Products (ProductID, ProductName, UnitPrice) VALUES
(101, 'Laptop Pro', 1200.00),
(102, 'Gaming Mouse', 45.50),
(103, 'Mechanical Keyboard', 85.00),
(104, 'Webcam HD', 60.00),
(105, 'External SSD 1TB', 100.00),
(106, 'Monitor 27-inch', 250.00);
3. Orders
Table
This table stores information about customer orders.
Orders
Data:
SQL
INSERT INTO Orders (OrderID, CustomerID, OrderDate, TotalAmount) VALUES
(1001, 1, '2025-01-10', 1325.50),
(1002, 2, '2025-01-15', 250.00),
(1003, 1, '2025-02-01', 145.00),
(1004, 3, '2025-02-10', 100.00),
(1005, 1, '2025-03-05', 60.00),
(1006, 4, '2025-03-12', 1200.00),
(1007, 2, '2025-04-01', 250.00),
(1008, 5, '2025-04-05', 45.50),
(1009, 1, '2025-05-20', 85.00), -- Order after '2025-01-01'
(1010, 2, '2025-06-01', 250.00), -- Recent order for '2025-06-01' example
(1011, 3, '2025-06-01', 100.00); -- Recent order for '2025-06-01' example
4. OrderDetails
Table
This table stores the details of each item in an order.
OrderDetails
Data:
SQL
INSERT INTO OrderDetails (OrderDetailID, OrderID, ProductID, Quantity, UnitPrice) VALUES
(1, 1001, 101, 1, 1200.00),
(2, 1001, 102, 2, 45.50),
(3, 1001, 103, 1, 85.00),
(4, 1002, 106, 1, 250.00),
(5, 1003, 105, 1, 100.00),
(6, 1003, 102, 1, 45.00), -- Slightly different price from current product price for demonstration
(7, 1004, 105, 1, 100.00),
(8, 1005, 104, 1, 60.00),
(9, 1006, 101, 1, 1200.00),
(10, 1007, 106, 1, 250.00),
(11, 1008, 102, 1, 45.50),
(12, 1009, 103, 1, 85.00),
(13, 1010, 106, 1, 250.00),
(14, 1011, 105, 1, 100.00);
SQL to Create and Populate Tables
Here’s the full SQL script to create these tables and insert the data. You can run this in most SQL database systems (like MySQL, PostgreSQL, SQL Server, SQLite).
SQL
-- Drop tables if they already exist to ensure a clean slate
DROP TABLE IF EXISTS OrderDetails;
DROP TABLE IF EXISTS Orders;
DROP TABLE IF EXISTS Products;
DROP TABLE IF EXISTS Customers;
-- Create Customers Table
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(100) NOT NULL,
CustomerAddress VARCHAR(200)
);
-- Create Products Table
CREATE TABLE Products (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(100) NOT NULL,
UnitPrice DECIMAL(10, 2) NOT NULL
);
-- Create Orders Table
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT NOT NULL,
OrderDate DATE NOT NULL,
TotalAmount DECIMAL(10, 2) NOT NULL,
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);
-- Create OrderDetails Table
CREATE TABLE OrderDetails (
OrderDetailID INT PRIMARY KEY,
OrderID INT NOT NULL,
ProductID INT NOT NULL,
Quantity INT NOT NULL,
UnitPrice DECIMAL(10, 2) NOT NULL, -- Price at the time of order
FOREIGN KEY (OrderID) REFERENCES Orders(OrderID),
FOREIGN KEY (ProductID) REFERENCES Products(ProductID)
);
-- Insert data into Customers
INSERT INTO Customers (CustomerID, CustomerName, CustomerAddress) VALUES
(1, 'Alice Wonderland', '123 Rabbit Hole, Fantasyland'),
(2, 'Bob The Builder', '456 Construction Site, Builderville'),
(3, 'Charlie Chaplin', '789 Silent Film St, Hollywood'),
(4, 'Diana Prince', '101 Themyscira Blvd, Paradise Island'),
(5, 'Eve Harrington', '202 Stage Door, Broadway');
-- Insert data into Products
INSERT INTO Products (ProductID, ProductName, UnitPrice) VALUES
(101, 'Laptop Pro', 1200.00),
(102, 'Gaming Mouse', 45.50),
(103, 'Mechanical Keyboard', 85.00),
(104, 'Webcam HD', 60.00),
(105, 'External SSD 1TB', 100.00),
(106, 'Monitor 27-inch', 250.00);
-- Insert data into Orders
INSERT INTO Orders (OrderID, CustomerID, OrderDate, TotalAmount) VALUES
(1001, 1, '2025-01-10', 1325.50),
(1002, 2, '2025-01-15', 250.00),
(1003, 1, '2025-02-01', 145.00),
(1004, 3, '2025-02-10', 100.00),
(1005, 1, '2025-03-05', 60.00),
(1006, 4, '2025-03-12', 1200.00),
(1007, 2, '2025-04-01', 250.00),
(1008, 5, '2025-04-05', 45.50),
(1009, 1, '2025-05-20', 85.00),
(1010, 2, '2025-06-01', 250.00),
(1011, 3, '2025-06-01', 100.00);
-- Insert data into OrderDetails
INSERT INTO OrderDetails (OrderDetailID, OrderID, ProductID, Quantity, UnitPrice) VALUES
(1, 1001, 101, 1, 1200.00),
(2, 1001, 102, 2, 45.50),
(3, 1001, 103, 1, 85.00),
(4, 1002, 106, 1, 250.00),
(5, 1003, 105, 1, 100.00),
(6, 1003, 102, 1, 45.00),
(7, 1004, 105, 1, 100.00),
(8, 1005, 104, 1, 60.00),
(9, 1006, 101, 1, 1200.00),
(10, 1007, 106, 1, 250.00),
(11, 1008, 102, 1, 45.50),
(12, 1009, 103, 1, 85.00),
(13, 1010, 106, 1, 250.00),
(14, 1011, 105, 1, 100.00);
When to use JOIN
s instead of subqueries:
1. For combining data from multiple tables based on related columns: This is the primary use case for JOIN
s. When you need to link rows from two or more tables where there’s a logical relationship (e.g., Customers
and Orders
linked by CustomerID
), JOIN
s are the clear winner.
- Subquery equivalent (less efficient and harder to read):
SQL
SELECT CustomerName FROM Customers WHERE CustomerID IN (SELECT CustomerID FROM Orders WHERE OrderDate >= '2025-01-01');
- JOIN equivalent (better):
SQL
SELECT c.CustomerName FROM Customers c JOIN Orders o ON c.CustomerID = o.CustomerID WHERE o.OrderDate >= '2025-01-01';
Why
JOIN
is better:- Performance: The database optimizer can often execute
JOIN
s much more efficiently than subqueries, especiallyIN
orNOT IN
subqueries, by using appropriate join algorithms and indexes. - Readability:
JOIN
syntax clearly expresses the relationship between tables. - Flexibility: You can easily add more tables to the join, specify different join types (
INNER JOIN
,LEFT JOIN
,RIGHT JOIN
,FULL OUTER JOIN
), and include columns from all joined tables in theSELECT
list.
- Performance: The database optimizer can often execute
2. When retrieving multiple columns from a related table: If your subquery would need to return multiple columns to be used in the outer query, a JOIN
is almost always a better choice.
- Subquery equivalent (not possible in many cases, or complex correlated subquery):
SQL
-- This type of subquery is generally not feasible for multiple columns without a correlated subquery -- A correlated subquery for this would be very inefficient SELECT o.OrderID, (SELECT c.CustomerName FROM Customers c WHERE c.CustomerID = o.CustomerID) AS CustomerName, (SELECT c.CustomerAddress FROM Customers c WHERE c.CustomerID = o.CustomerID) AS CustomerAddress FROM Orders o;
- JOIN equivalent (much better):
SQL
SELECT o.OrderID, c.CustomerName, c.CustomerAddress FROM Orders o JOIN Customers c ON o.CustomerID = c.CustomerID;
3. For aggregations across related tables: When you need to perform aggregations (e.g., COUNT
, SUM
, AVG
) that involve data from multiple tables, JOIN
s combined with GROUP BY
are the standard and most efficient approach.
- Subquery equivalent (less efficient and harder to reason about):
SQL
SELECT CustomerName, (SELECT COUNT(OrderID) FROM Orders o WHERE o.CustomerID = c.CustomerID) AS NumberOfOrders FROM Customers c;
- JOIN equivalent (better):
SQL
SELECT c.CustomerName, COUNT(o.OrderID) AS NumberOfOrders FROM Customers c LEFT JOIN Orders o ON c.CustomerID = o.CustomerID GROUP BY c.CustomerName;
4. When a subquery returns a large result set: If a subquery is returning many rows that then need to be processed by the outer query, a JOIN
can often optimize the query execution plan significantly.
When to use CTE
s instead of subqueries:
CTE
s (Common Table Expressions) are defined using the WITH
clause and are essentially named temporary result sets that you can reference within a single SELECT
, INSERT
, UPDATE
, or DELETE
statement. They are particularly useful for:
1. Improving Readability and Understanding Complex Queries: When you have multiple nested subqueries, a query can quickly become an unreadable mess. CTE
s allow you to break down a complex query into logical, self-contained steps, each with a descriptive name.
- Subquery equivalent (hard to read):
SQL
SELECT p.ProductName, (SELECT AVG(UnitPrice) FROM OrderDetails od WHERE od.ProductID = p.ProductID AND od.OrderID IN (SELECT OrderID FROM Orders WHERE OrderDate >= '2025-01-01')) AS AvgPriceRecentOrders FROM Products p;
- CTE equivalent (much more readable):
SQL
WITH RecentOrders AS ( SELECT OrderID FROM Orders WHERE OrderDate >= '2025-01-01' ), ProductOrderDetails AS ( SELECT ProductID, UnitPrice FROM OrderDetails WHERE OrderID IN (SELECT OrderID FROM RecentOrders) ) SELECT p.ProductName, AVG(pod.UnitPrice) AS AvgPriceRecentOrders FROM Products p JOIN ProductOrderDetails pod ON p.ProductID = pod.ProductID GROUP BY p.ProductName;
2. Reusing the same subquery multiple times within a single query: If you need to use the result of a subquery more than once in the same main query, a CTE
avoids redundant code and potentially allows the database optimizer to execute the CTE
once and reuse the cached result.
- Subquery equivalent (redundant):
SQL
SELECT c.CustomerName, (SELECT COUNT(OrderID) FROM Orders o WHERE o.CustomerID = c.CustomerID) AS TotalOrders, (SELECT SUM(TotalAmount) FROM Orders o WHERE o.CustomerID = c.CustomerID AND OrderDate >= '2025-01-01') AS RecentOrderValue FROM Customers c;
- CTE equivalent (reusable):
SQL
WITH CustomerOrderSummary AS ( SELECT CustomerID, COUNT(OrderID) AS TotalOrders, SUM(CASE WHEN OrderDate >= '2025-01-01' THEN TotalAmount ELSE 0 END) AS RecentOrderValue FROM Orders GROUP BY CustomerID ) SELECT c.CustomerName, cos.TotalOrders, cos.RecentOrderValue FROM Customers c LEFT JOIN CustomerOrderSummary cos ON c.CustomerID = cos.CustomerID;
3. Performing Recursive Queries: CTE
s are essential for writing recursive queries, which are used to traverse hierarchical data (e.g., organizational charts, bill of materials). Subqueries cannot achieve this.
4. Simulating Views for Complex Operations: While views are persistent, CTE
s provide a way to define a temporary, named result set that acts like a view for the duration of a single query. This is useful for complex calculations or data transformations that don’t warrant a permanent view.
5. Breaking down complex calculations: If you have a series of intermediate calculations that build upon each other, CTE
s allow you to define each step clearly.
When a subquery might be acceptable or even preferred:
While JOIN
s and CTE
s are generally better, there are a few scenarios where a subquery might be acceptable or even more concise:
-
Simple Scalar Subqueries: When a subquery returns a single value (a scalar) and is used in the
SELECT
list orWHERE
clause, it can be very concise.SQL
SELECT ProductName, UnitPrice, (SELECT AVG(UnitPrice) FROM Products) AS AveragePrice FROM Products WHERE UnitPrice > (SELECT AVG(UnitPrice) FROM Products);
However, even here, a
JOIN
with a derived table orCTE
could be used for performance if the subquery is executed multiple times or for very large datasets. -
EXISTS
orNOT EXISTS
clauses for checking existence: These are often very efficient for simply checking if related rows exist or not.SQL
SELECT CustomerName FROM Customers c WHERE EXISTS (SELECT 1 FROM Orders o WHERE o.CustomerID = c.CustomerID AND o.OrderDate = '2025-06-01');
-
Correlated Subqueries (use with caution): While generally less efficient than
JOIN
s, correlated subqueries can sometimes be more intuitive for specific row-by-row comparisons where aJOIN
might require more complex logic. However, they should be used sparingly due to potential performance issues.
Conclusion:
As a general rule, prioritize JOIN
s and CTE
s over subqueries in SQL for better:
- Readability and Maintainability: Easier to understand and debug.
- Performance: Database optimizers are typically better at optimizing
JOIN
operations. - Reusability:
CTE
s promote code reuse within a single query. - Modularity: Breaking down complex problems into smaller, manageable parts.
By consistently using JOIN
s and CTE
s, you’ll write more robust, efficient, and understandable SQL queries.