SQL Remove Duplicates: How To Delete Duplicate Records in SQL – Duplicate records in a database can cause inconsistent results and affect performance. Removing them helps maintain data accuracy and efficiency.

There are different methods of removing duplicates while retrieving records in SQL. Each method depends on the DBMS, such as SQL Server, MySQL, and PostgreSQL.

SQL Remove Duplicates: How To Delete Duplicate Records in SQL

The common causes of duplicate rows in SQL include the following:

Missing Primary Keys: When tables lack a defined primary key or unique constraint, there is no mechanism to prevent the insertion of duplicate data. This can happen when a table is not normalized and/or there are transitive dependency issues.

Data Integration Issues: When merging datasets from different sources, improper joins or inconsistencies in data formats can accidentally introduce duplicates.

Manual Data Entry Errors: Human error, such as entering the same record multiple times, is another common cause of duplicate rows.

Optimal Performance: Redundant data can slow down queries, especially when dealing with large datasets.

Efficient Storage: Removing duplicates helps optimize storage usage, keeping your database lean.

SQL Remove Duplicates Methods:

While you can remove duplicate records using queries, you can also permanently delete them from the database. This approach is important for maintaining data quality. The following methods are used to remove duplicates from the database.

Remove Duplicates Using DISTINCT keyword in SQL:

The DISTINCT keyword is used in a SELECT statement to retrieve unique rows. The DISTINCT keyword syntax for removing duplicates is similar for MySQL, PostgreSQL, and SQL Server databases.

SELECT DISTINCT Name
FROM Employees;

You can use DISTINCT to select only unique rows and then insert them back into the original table, effectively deleting duplicates.

WITH DistinctEmployees AS (
SELECT DISTINCT Name, Department
FROM Employees
)
DELETE FROM Employees;
INSERT INTO Employees (Name, Department)
SELECT Name, Department
FROM DistinctEmployees;

Remove Duplicates Using ROW_NUMBER() function in SQL:

Delete 20 rows from weather table using ROW_NUMBER(), typically the oldest / lowest priority records, while EXCLUDING the latest record.

Below is a DBMS-neutral, interview-ready explanation of how to delete exactly 20 records from a weather table using ROW_NUMBER(), with logic, execution flow, and performance notes.

ROW_NUMBER() OVER (ORDER BY recorded_date DESC)

rn = 1 → latest record (KEEP)
rn = 2–21 → next 20 records (DELETE)

Step 1: Identify the preview rows to be deleted:

SELECT id, city, temperature, recorded_date, rn
FROM (
SELECT id,
city,
temperature,
recorded_date,
ROW_NUMBER() OVER (ORDER BY recorded_date DESC) AS rn
FROM weather
) t
WHERE rn BETWEEN 2 AND 21;

output:

id	city	temp	recorded_date	rn
45	Delhi	31	2025-01-19	2
44	Delhi	30	2025-01-18	3
…	…	…	…	…
25	Delhi	28	2025-01-01	21

Step 2: DELETE Query using ROW_NUMBER():

DELETE FROM weather
WHERE id IN (
SELECT id
FROM (
SELECT id,
ROW_NUMBER() OVER (ORDER BY recorded_date DESC) AS rn
FROM weather
) t
WHERE rn BETWEEN 2 AND 21
);

Why the extra subquery?

Because MySQL / Oracle do NOT allow deleting from the same table used directly in a subquery.

Execution Time Analysis

⏱ Time Complexity

Operation	Cost
Table scan	O(n)
ORDER BY	O(n log n)
Delete 20 rows	O(20)

Performance Optimization:

Create index:

CREATE INDEX idx_weather_recorded_date
ON weather(recorded_date DESC);

Rows	Time
10K	< 10 ms
1M	80–200 ms
10M	400–900 ms

Best Practice:

START TRANSACTION;

DELETE FROM weather
WHERE id IN (
SELECT id
FROM (
SELECT id,
ROW_NUMBER() OVER (ORDER BY recorded_date DESC) rn
FROM weather
) t
WHERE rn BETWEEN 2 AND 21
);

SELECT * FROM weather ORDER BY recorded_date DESC;

ROLLBACK;

Remove Duplicates Using GROUP BY and COUNT() in SQL:

Since GROUP BY works on groups, not row ranking:

We group by recorded_date

Use COUNT() to understand volume

Limit delete to 20 rows using date logic

Find Latest Record (KEEP THIS):

SELECT MAX(recorded_date) AS latest_date
FROM weather;

Preview Rows Eligible for Delete:

SELECT recorded_date, COUNT(*) AS cnt
FROM weather
WHERE recorded_date < (
SELECT MAX(recorded_date) FROM weather
)
GROUP BY recorded_date
ORDER BY recorded_date DESC;

recorded_date	cnt
2025-01-19	3
2025-01-18	4
2025-01-17	5
…	…

DELETE 20 Rows (GROUP BY + COUNT):

DELETE FROM weather
WHERE recorded_date IN (
SELECT recorded_date
FROM (
SELECT recorded_date
FROM weather
WHERE recorded_date < (
SELECT MAX(recorded_date) FROM weather
)
GROUP BY recorded_date
ORDER BY recorded_date DESC
LIMIT 20
) t
);

Conclusion
Removing duplicate rows in SQL Server is crucial for maintaining clean and accurate datasets. The methods outlined—using GROUP BY with HAVING, Common Table Expressions (CTE), and the RANK() function—offer versatile solutions for different scenarios. Whether you need to identify duplicates, remove them while retaining the first occurrence, or prioritize rows based on ranking, SQL Server provides robust tools to achieve these goals.

Frequently Asked Questions (FAQ) regarding SQL Remove Duplicates:

1. What is a duplicate record in SQL?

Answer:
A duplicate record is when two or more rows have the same values in columns that should be unique (e.g., same email or ID).

2. How do you delete duplicates using ROW_NUMBER()?

Answer:
Assign row numbers per group and delete rows where ROW_NUMBER() > 1.

DELETE FROM table_name
WHERE id IN (
SELECT id FROM (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY col ORDER BY id) rn
FROM table_name
) t WHERE rn > 1
);