Getting first and last row from a group set without sub queries in mysql
February 18th, 2010
No comments
When I use the GROUP BY clause in mysql It starts grouping by the first different row in the resultset so for example if I have this table:
mysql> select * from grouptest;+----+---------+-------+------------+ | id | city | value | date |+----+---------+-------+------------+ | 1 | tijuana | 10 | 2010-02-02 | | 2 | tijuana | 11 | 2010-02-09 | | 3 | tijuana | 15 | 2010-02-12 | | 4 | tijuana | 17 | 2010-02-17 | | 5 | tecate | 12 | 2010-02-10 | | 6 | tecate | 1 | 2010-02-14 | | 7 | tecate | 29 | 2010-02-17 |+----+---------+-------+------------+ 7 rows in set (0.00 sec)
If I group by city this is what I get:
mysql> SELECT * FROM grouptest g GROUP BY city;+----+---------+-------+------------+ | id | city | value | date |+----+---------+-------+------------+ | 5 | tecate | 12 | 2010-02-10 | | 1 | tijuana | 10 | 2010-02-02 |+----+---------+-------+------------+ 2 rows in set (0.00 sec)
But I want to get the last inserted rows, so could do a sub query like this:
mysql> SELECT * FROM grouptest g -> WHERE g.date = (SELECT MAX(date) FROM grouptest gt WHERE g.city = gt.city) -> GROUP BY g.city;+----+---------+-------+------------+ | id | city | value | date |+----+---------+-------+------------+ | 7 | tecate | 29 | 2010-02-17 | | 4 | tijuana | 17 | 2010-02-17 |+----+---------+-------+------------+ 2 rows in set (0.00 sec)
But if the db is large making subqueries could be expensive, so a cleaner solution might be using the GROUP_CONCAT function, so for example the above query could be written like this:
mysql> SELECT id, city, value, date, GROUP_CONCAT(date) AS all_dates -> FROM grouptest g -> GROUP BY city;+----+---------+-------+------------+---------------------------------------------+ | id | city | value | date | all_dates |+----+---------+-------+------------+---------------------------------------------+ | 5 | tecate | 12 | 2010-02-10 | 2010-02-10,2010-02-14,2010-02-17 | | 1 | tijuana | 10 | 2010-02-02 | 2010-02-02,2010-02-09,2010-02-12,2010-02-17 |+----+---------+-------+------------+---------------------------------------------+ 2 rows in set (0.00 sec)
At this point I can explode the all_dates column in php or split them in perl to get the last date for each city, but a little more work on our query could do our job simpler, something like this for instance:
mysql> SELECT id, city, value, SUBSTRING_INDEX(GROUP_CONCAT(date),',',-1) AS date -> FROM grouptest g -> GROUP BY city;+----+---------+-------+------------+ | id | city | value | date |+----+---------+-------+------------+ | 5 | tecate | 12 | 2010-02-17 | | 1 | tijuana | 10 | 2010-02-17 |+----+---------+-------+------------+ 2 rows in set (0.00 sec)
Or if I need to get the first and last rows for each group set I just add the date column like this:
mysql> SELECT id, city, value, date as first_date, SUBSTRING_INDEX(GROUP_CONCAT(date),',',-1) AS last_date -> FROM grouptest g -> GROUP BY city;+----+---------+-------+------------+------------+ | id | city | value | first_date | last_date |+----+---------+-------+------------+------------+ | 5 | tecate | 12 | 2010-02-10 | 2010-02-17 | | 1 | tijuana | 10 | 2010-02-02 | 2010-02-17 |+----+---------+-------+------------+------------+ 2 rows in set (0.00 sec)
There may be several other ways around this, but for now this approach is enough, I’m not sure about the impact this 2 functions may have on the server but I’m sure it is less than executing a subquery.