SQL Correlated Subquery

In standard SQL, a subquery is a query nested inside another statement. A correlated subquery is a specific type of subquery that relies on the outer query for its values. Think of it like a "nested loop" in traditional programming: for every single row processed by the outer query, the inner subquery runs once.

Unlike a non-correlated subquery (which runs once, finishes, and passes its result to the outer query), a correlated subquery is executed repeatedly. This deep connection between the two queries allows you to perform complex row-by-row comparisons that wouldn't be possible with a simple join.

Developer Tip: Think of a correlated subquery as a function call inside a loop. The outer query provides the "arguments," and the inner query uses those arguments to return a specific result for that specific row.

 

Syntax of Correlated Subquery

SELECT column1, column2, ...
FROM outer_table o
WHERE column_name operator (SELECT column_name 
                             FROM inner_table i
                             WHERE i.reference_column = o.reference_column);
  • outer_table: The primary table being queried. We usually give it an alias (like o or e) so the subquery can "reach out" and grab its values.
  • inner_table: The table used inside the subquery to perform the check.
  • operator: Comparison operators like =, >, <, or logical operators like EXISTS and IN.
Best Practice: Always use table aliases (e.g., employees e) when writing correlated subqueries. This prevents "ambiguous column" errors and makes it clear which column belongs to the outer query versus the inner query.

 

Example of Correlated Subquery

Let's look at a classic real-world scenario. Imagine you are managing a payroll system and need to find high-performers who are paid significantly more than their immediate peers.

Employees Table:

employee_id name department_id salary
1 John 101 50000
2 Jane 102 60000
3 Michael 101 45000
4 Sarah 102 70000

Departments Table:

department_id department_name
101 HR
102 IT

 

Correlated Subquery Example

Goal: Find employees whose salary is higher than the average salary within their own department.

SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (SELECT AVG(e2.salary) 
                  FROM employees e2 
                  WHERE e2.department_id = e.department_id);

Explanation:

  • The outer query looks at the first employee (John).
  • The inner query takes John's department_id (101) and calculates the average salary for everyone in department 101.
  • The outer query then checks if John's salary is greater than that average.
  • This process repeats for Jane, Michael, and Sarah.
Watch Out: Because the subquery runs for every row, correlated subqueries can be slow on very large tables (millions of rows). In those cases, consider using a JOIN with a derived table or a Window Function for better performance.

Output:

name salary
Jane 60000
Sarah 70000

 

How Correlated Subqueries Work

  1. Row Selection: The outer query fetches a candidate row.
  2. Variable Passing: Values from that candidate row are passed into the inner subquery.
  3. Subquery Execution: The subquery executes using those external values.
  4. Evaluation: The outer query uses the subquery's result to decide whether to include the candidate row in the final output.
  5. Iteration: The process moves to the next row and repeats until all rows are checked.
Common Mistake: Forgetting to link the inner query to the outer query. If you omit WHERE e2.department_id = e.department_id, the subquery will return the average salary of the entire company for every row, turning it into a non-correlated subquery.

 

Correlated Subquery with IN

You can also use correlated subqueries to filter data based on relationships in other tables. While this can often be done with a JOIN, using a subquery is sometimes more readable when you just need to check for existence.

SELECT e.name
FROM employees e
WHERE e.department_id IN (SELECT d.department_id 
                           FROM departments d 
                           WHERE d.department_name = 'IT'
                           AND d.department_id = e.department_id);

Explanation:

  • For every employee, the subquery checks if their department_id belongs to a department named "IT".
  • If the match is found, the employee's name is returned.
Developer Tip: When checking for the existence of related records, EXISTS is usually preferred over IN for correlated subqueries because EXISTS stops searching as soon as it finds the first match, making it slightly more efficient.

 

Key Points

  • A correlated subquery is "correlated" because it depends on the outer query's data.
  • It acts like a row-by-row filter.
  • It is highly flexible and great for "Top N per group" or "Compare against group average" problems.
  • Performance is the main trade-off; use them wisely on large datasets.

 

Summary

Correlated subqueries are a powerful tool in a developer's SQL toolkit. They allow for sophisticated logic by linking the inner and outer queries together. While they are intuitive to write for complex filtering, always keep an eye on execution plans to ensure your database stays performant as your data grows.