Friday, February 24, 2012

Get percentage with variation of field values (country names)

Any help here would be greatly appreciated...

Unfortunately, data wasn't filtered prior to getting inserted into this table. Now I am stuck with cleaning it up. I have thought about writing a query to update all the values, but there are just too many variations, including spelling mistakes, so I've ruled that out as a possible solution.
I have a table which has a Country field but the values per record vary. For example US, U.S., USA, United States, UK, United Kingdom, Canada, Can, etc. I'm trying to find the percent of records per country.

Sample table data: mytable
Id Name Country
1 John US
2 James UK
3 Jane United States
4 Mary Canada
5 Jack U.S.
6 Tony United Kingdom
7 Jeff US
8 Tom Canada
9 Beth UK
10 Mark USA
I would like to show
US: 50% --> (includes any variation of US ncluding US, U.S., USA, United States)
UK: 30%
CAN: 20%
I've made several attempts myself with no luck. Thanks in advance.

You have to clean the country list first.

I would do it by retreving distinct country list and update the table for this column mannually( I mean separate updates). For example,

UPDATE mytableSET COUNTRY='USA'

WHERE Country='US'OR Country='U.S.'OR Country='United States'

These three USA names are from your sample data. This OR list will be long if you include all (mis)spellings you can find for the USA from your dirty data source.

After you have clean data, you can do something like this:

SELECT COUNTRY,count(COUNTRY)as cCount,(

CAST(count(COUNTRY)ASfloat)/CAST((SELECTcount(*)FROM countries$) ASfloat)*100)as countryPercent

FROM mytable

GROUPBY country

|||I figured the data would have to be cleaned... thanks for help with the second query, much appreciated... great help in this forum.

No comments:

Post a Comment