Friday, January 10, 2014

Find the most popular capitalization of a word or phrase

Today, I needed to find the most popular capitalization of user entered product brands. Previously, I'd just used lowercase brands, something like this:

However, this was returning "whole foods" instead of "Whole Foods", which was considered more desirable.

Using window functions and common table expressions, this is what I came up with:

Given this data:

Here is how it works:

The first part of the query gives us the total count for each brand, grouped by lower case, and for each unique capitalization of that brand provides a count and row number.

In this query, the rows with row_number == 1 are all the unique capitalizations

Of those, we want the ones with the highest count. Again, we partition with another common table expression. This time, ordering by the count per unique capitalization to ensure that the first row for each group was the most popular

From here, we just pick the first row for each brand and get the most popular capitalization

Simple enough eh?

Messy rails code demonstrating this and generating the tables in this post

More about window functions

More about common table expressions

No comments:

Post a Comment