- SQL Basics
- SQL Introduction
- SQL Syntax
- SQL Sample Database
- SQL SELECT
- SQL WHERE
- SQL ORDER BY
- SQL DISTINCT
- SQL LIMIT
- SQL FETCH
- SQL AND
- SQL OR
- SQL BETWEEN
- SQL IN
- SQL LIKE
- SQL IS NULL
- SQL Comparison Operators
- SQL Logical Operators
- SQL Alias
- SQL CASE
- Joins and Subqueries
- SQL INNER JOIN
- SQL LEFT JOIN
- SQL RIGHT JOIN
- SQL FULL OUTER JOIN
- SQL SELF JOIN
- SQL CROSS JOIN
- SQL Subquery
- SQL Correlated Subquery
- SQL UNION
- SQL INTERSECT
- SQL EXCEPT
- Aggregate Functions
- SQL AVG
- SQL COUNT
- SQL MAX
- SQL MIN
- SQL SUM
- SQL GROUP BY
- SQL HAVING
- SQL ROLLUP
- SQL CUBE
- SQL GROUPING SETS
- Database Management
- SQL CREATE DATABASE
- SQL ALTER DATABASE
- SQL DROP DATABASE
- SQL BACKUP DATABASE
- SQL SHOW DATABASES
- SQL SELECT DATABASE
- Table Management
- SQL CREATE TABLE
- SQL ALTER TABLE
- SQL ADD COLUMN
- SQL DROP COLUMN
- SQL DROP TABLE
- SQL TRUNCATE TABLE
- SQL SHOW TABLES
- SQL RENAME TABLE
- SQL Constraints
- SQL Primary Key
- SQL Foreign Key
- SQL UNIQUE Constraint
- SQL CHECK Constraint
- SQL NOT NULL Constraint
SQL GROUP BY
The GROUP BY statement is a fundamental tool for data analysis in SQL. Its primary purpose is to "collapse" multiple rows that share the same values into summary rows. Think of it as creating buckets: you put all rows with a specific value into one bucket and then perform a calculation, like counting or summing, on the contents of that bucket.
It is almost always used alongside aggregate functions such as COUNT(), SUM(), AVG(), MAX(), and MIN(). Without an aggregate function, a GROUP BY behaves much like a DISTINCT clause.
Syntax of SQL GROUP BY
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name;
- column_name: The specific column you want to use as your "category."
- aggregate_function(): The math operation you want to perform on the groups (e.g., total sales, average age).
- table_name: The source table.
- condition: Filters the raw data before any grouping happens.
AS keyword. It makes your result sets much easier for other developers (or your frontend code) to read.
Key Points
- The GROUP BY clause must come after the
WHEREclause but before theORDER BYclause. - The "Golden Rule": Every column listed in your
SELECTstatement that is not wrapped in an aggregate function must be included in theGROUP BYclause. - If you use
GROUP BYon a column containingNULLvalues, all NULLs are grouped into a single summary row.
Example of SQL GROUP BY
Let's look at a practical scenario. Imagine we are running an e-commerce platform with the following table:
Table: Orders
To calculate the total quantity ordered by each customer:
SELECT customer_name, SUM(quantity) AS total_quantity
FROM orders
GROUP BY customer_name;
Output:
| customer_name | total_quantity |
|---|---|
| Alice | 8 |
| Bob | 6 |
| Charlie | 7 |
Explanation:
- SQL identifies three unique names: Alice, Bob, and Charlie.
- For Alice, it finds two rows (3 + 5) and returns 8.
- For Bob, it finds two rows (2 + 4) and returns 6.
- This reduces our 5-row table into a 3-row summary.
ORDER BY customer_name at the end of your query.
SQL GROUP BY with Multiple Columns
You can group by more than one column to get more granular data. This creates a group for every unique combination of the columns listed.
Example: Total quantity ordered by each customer for each product:
SELECT customer_name, product_name, SUM(quantity) AS total_quantity
FROM orders
GROUP BY customer_name, product_name;
Output:
| customer_name | product_name | total_quantity |
|---|---|---|
| Alice | Apple | 3 |
| Alice | Banana | 5 |
| Bob | Apple | 2 |
| Bob | Cherry | 4 |
| Charlie | Banana | 7 |
Explanation:
- Instead of just grouping by Alice, the database looks for "Alice + Apple" and "Alice + Banana" separately. Since every order in our table was for a unique person-product combination, the result looks similar to the original table, but with the aggregate function applied.
SQL GROUP BY with HAVING Clause
A common hurdle for beginners is trying to filter groups using WHERE. However, the WHERE clause doesn't work on aggregated values (like SUM) because it filters rows before the grouping happens. To filter after grouping, we use HAVING.
Example: Show customers who ordered more than 5 items in total:
SELECT customer_name, SUM(quantity) AS total_quantity
FROM orders
GROUP BY customer_name
HAVING SUM(quantity) > 5;
Output:
| customer_name | total_quantity |
|---|---|
| Alice | 8 |
| Charlie | 7 |
Explanation:
- The database first groups the customers and calculates their totals.
- Then, the HAVING clause looks at those totals and removes Bob because his total (6) is not greater than 5.
WHERE to filter individual rows (e.g., WHERE status = 'shipped') to reduce the data the database has to process. Use HAVING only when you need to filter based on the result of an aggregate function.
SQL GROUP BY with Aggregate Functions
You aren't limited to just one calculation. You can calculate the count, sum, and average all in the same GROUP BY query to get a comprehensive view of your data.
Example: Summarizing product performance:
SELECT
product_name,
COUNT(order_id) AS total_orders,
SUM(quantity) AS total_quantity,
AVG(quantity) AS avg_quantity_per_order
FROM orders
GROUP BY product_name;
Output:
| product_name | total_orders | total_quantity | avg_quantity |
|---|---|---|---|
| Apple | 2 | 5 | 2.5 |
| Banana | 2 | 12 | 6.0 |
| Cherry | 1 | 4 | 4.0 |
Summary
The GROUP BY clause is the engine behind data summarization in SQL. By collapsing rows into categories, you can transform raw transactional data into meaningful insights. Remember the order of operations: filter raw rows with WHERE, group them with GROUP BY, filter the groups with HAVING, and finally organize your output with ORDER BY.