GROUP_CONCAT

I’m working on a community dashboard for work that’ll help me monitor different ways in which people are interacting with the community. One metric I thought it’d be neat to have was how many people have filed more than X bug reports during the last period Y (month, say, or week) and what the average for that period over time has been. This involves a pretty complicated query wherein you have to group by both user and period, which is kind of tricky. It’s also very expensive: With only about 1200 bugs in our database, this query was taking six seconds when I was testing it out. That’s clearly unacceptable. So I poked around a bit and discovered the GROUP_CONCAT function, which lets you group by one column and output the relevant grouped data in a delimited column. So say your query is something like this:

select DATE_FORMAT(creation_ts,’%Y-%m’) date, group_concat(reporter) cnt from bugs group by DATE_FORMAT(creation_ts,’%Y-%m’)

Here I’m getting all months in the database and a listing of all reporters of bugs for that month. The “cnt” column is a comma-delimited lists of user ids for those who reported bugs in the given month, duplicates included (though you can specify DISTINCT to eliminate duplicates). The query returns almost instantly and gives me something I can parse pretty easily in the programming language of my choice.

In my case, I split the “cnt” column on commas for each result and tally the ids for users who appear in the list X or more times. I then pass this data back to a function that does math to get averages and to find a count for the current month, and voila, community metrics.

I blog this here because it’s the sort of thing I might want to remember later, and I always put that stuff here rather than at my work blog, where my regular readers probably figure this should actually go.

One Response so far »

  1. 1

    Mike said,

    July 15, 2007 @ 6:33 pm

    God (irony noted) I miss MySQL. All these years I thought Oracle was the ultimate, and had things I could never dream of… and now I have the reality. Oracle has *less* functions than MySQL and even MS SQL… no cool new stuff like group_concat. Over the years, not having subselects and procedures I leared to make in alone in the desert with just my knife and me… now back in the city I see these people over desiging every query and something taking 6 seconds is considered good if not amazing. I feel like the only person in the room, who is shrinking as he screams “Should we really be putting all our app logic in our data layer?!?!?”

    I still read the MySQL release notes… 5 looks so damn cool…

Comment RSS · TrackBack URI

Say your words