SQL DISTINCT

In a real-world database, tables often contain duplicate values across various rows. For example, a customers table might have hundreds of entries from the same city. The DISTINCT keyword is used in SQL to filter these results and return only unique values, effectively "de-duplicating" your output.

Developer Tip: Think of DISTINCT as a unique filter for your result set. It doesn't change the data in your table; it only changes how the data is presented in your query results.

 

Basic Syntax of DISTINCT

SELECT DISTINCT column1, column2, ...
FROM table_name;
  • column1, column2, ...: These are the columns where you want to identify unique values.
  • If you specify only one column, SQL returns only the unique entries for that specific field.
  • If multiple columns are used, SQL evaluates the combination of those columns. A row is only considered a duplicate if every single column value in the SELECT list matches another row.
Common Mistake: Beginners often try to use DISTINCT on only one column while selecting several: SELECT DISTINCT(city), name FROM customers. In SQL, DISTINCT applies to the entire row of the result set, not just the first column.

 

Key Features of DISTINCT Clause

1. Removing Duplicate Values from One Column

When you need a quick list of categories, locations, or status types, using DISTINCT on a single column is the most efficient method.

SELECT DISTINCT city
FROM customers;

This query will scan the customers table and return a clean list of unique cities, no matter how many customers live in each one.

2. Removing Duplicate Rows Based on Multiple Columns

You can use DISTINCT with multiple columns to find unique pairings or groups. This is helpful when you want to see all unique locations (City + Country) without seeing the same city listed multiple times for the same country.

SELECT DISTINCT city, country
FROM customers;

In this case, "Springfield, USA" and "Springfield, UK" would both appear because their combinations are unique, even though the city name is the same.

Best Practice: Use multi-column DISTINCT when cleaning data for reports, such as identifying unique shipping routes or unique product-supplier pairings.

3. DISTINCT in Aggregate Functions

DISTINCT is incredibly powerful when used inside aggregate functions like COUNT(). This allows you to count how many unique items exist rather than just counting the total number of rows.

SELECT COUNT(DISTINCT city)
FROM customers;

Instead of getting the total number of customers, this query tells you exactly how many different cities your customers are located in.

Watch Out: DISTINCT ignores NULL values in some contexts but treats them as a single unique value in others. If you have 10 rows with NULL in the city column, SELECT DISTINCT city will return one NULL row.

 

Example Queries

1. Get Unique Job Titles from the Employees Table

Imagine a company with 500 employees, but only 10 actual roles (Developer, Manager, HR, etc.). To see the list of roles:

SELECT DISTINCT job_title
FROM employees;

2. Get Unique City and Country Combinations

Useful for logistics and mapping applications where you need to know every unique geographical area your business serves.

SELECT DISTINCT city, country
FROM customers;

3. Count the Number of Unique Products Sold

In a sales ledger, the same product ID might appear thousands of times. To find out how many different types of products actually moved today:

SELECT COUNT(DISTINCT product_id)
FROM sales
WHERE sale_date = '2023-10-01';

4. Get Unique Product Categories from the Products Table

Perfect for populating a dropdown menu in a web application's user interface.

SELECT DISTINCT category
FROM products;

 

Important Notes

  • Performance Cost: DISTINCT requires the database to sort or hash the data to find duplicates. On tables with millions of rows, this can lead to slower query performance. Always check if you can use a WHERE clause to limit the data first.
  • The "All Columns" Effect: If you use SELECT DISTINCT *, SQL looks for rows where every single value in every column is identical. If your table has a unique primary key (like id), SELECT DISTINCT * will essentially do nothing because the ID makes every row unique anyway.
  • NULL Handling: SQL treats all NULL values as identical for the purpose of the DISTINCT keyword.

 

Summary

The DISTINCT keyword is an essential tool for any developer's SQL toolkit. It simplifies data analysis by stripping away redundancy, allowing you to focus on unique data points. Whether you are generating a list of unique categories for a UI or performing complex counts for a business report, DISTINCT ensures your results are concise and accurate.