In SQL, COUNT() is an aggregate function that we can use to count the number of rows in a query. We can also use this function to figure out if there are duplicate records in a table.

Here is an example of a query that would return one number that represents the number of rows in a table:

view plain print about
1SELECT COUNT(id) AS totel_hotel_count FROM hotels

Here is an example of how we would count the number of hotels with the same name in a table:

view plain print about
1SELECT hotel_name, COUNT(hotel_name) AS hotel_count
2FROM hotels
3GROUP BY hotel_name

The GROUP BY clause that we've added here groups together all the rows in a table that share the same values in the columns listed, into one row. One tricky thing with the GROUP BY clause is, if you add a column to the SELECT list then it has to be added to the GROUP BY clause. Once we start adding columns to the GROUP BY clause, it starts adding more values that we would have to match across all the results in order to group them together (the last example in this entry addresses this).

The results returned above contains all hotels with a count of how many there may be. Now, what about if we want to just grab the Hotels that are duplicates?

view plain print about
1SELECT hotel_name, COUNT(hotel_name) AS hotel_count
2FROM hotels
3GROUP BY hotel_name
4HAVING COUNT(hotel_name) >
1

Notice how we use HAVING instead of doing this in a WHERE clause. The reason behind this is that WHERE only deals with individual rows, so we would use this before the GROUP BY statement - HAVING is used to filter grouped rows.

Here is an example that incorporates a WHERE clause and only grabs the duplicate active hotels:

view plain print about
1SELECT hotel_name, COUNT(hotel_name) AS hotel_count
2FROM hotels
3WHERE hotel_active = '1'
4GROUP BY hotel_name
5HAVING COUNT(hotel_name) >
1

Now, for an example where we add more columns to our GROUP BY clause. I have added the unique ID of each hotel to the SELECT and GROUP BY clauses and we have no results returned. Why? This is because although the names match, the id is not the same across the rows so they aren't grouped together.

view plain print about
1SELECT hotel_name, COUNT(hotel_name) AS hotel_count, id
2FROM hotels
3GROUP BY hotel_name, id
4HAVING COUNT(hotel_name) >
1

'Til next time, Bridget