Jae Carr
Jae Carr

Reputation: 1225

Combining similar values in a single column

I have a column that is being used to list competitors names in a table I'm putting together. Right now don't have a lot of control over how these inputs are made, and it causes some serious headaches. There are random spaces and misspellings throughout our data, and yet we need to list the data by competitor.

As an example (not actual SQL I'm using), list of competitors:

Price Cutter
PriceCutter
PriceCuter
Price Cuter

If I ran the query:

SELECT Competitor_Name, SUM(Their_Sales),
FROM Cmdata.Competitors
Where Their_Sales
Between 10000 AND 100000000
Group by Competitor_Name

I would get a different entry for each version of Price Cutter, something I clearly want to avoid.

I would think this problem would come up a lot, but I did a Google search and came up dry. I will admit, the question is kind of hard to articulate in a few words, maybe that's why I didn't come with anything. Either that or this is so basic I should already know...

(PS- Yes, we're moving to a drop down menu, but it's gonna take some time. In the mean time, is there a solution?)

Upvotes: 1

Views: 116

Answers (3)

James Curran
James Curran

Reputation: 103535

(I'm a bit hazy on the syntax, but this is close)

alter table Competitors add column cleanedName (varchar(100));

update Competitors set cleanedName = Replace(Upper(Competitor_Name), ' ', '')

then Group by cleanedName instead of Competitor_Name

Upvotes: 1

Mark Reed
Mark Reed

Reputation: 95315

This is a very hard problem in general. If your database supports it, you could try grouping by SOUNDEX(Competitor_Name) instead of just Competitor_Name.

Really, the Competitor_Name column should be a foreign key into a Competitors table anyway, instead of a bare text field.

Whatever you do to fix, you should also UPDATE the table so that you don't have to do this sort of hoop-jumping in the future.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270463

You need to add a Competitor table, that has a standard name for each competitor.

Then, use foreign key references in other tables.

The problem that you are facing is a data cleansing and data modeling issue. It is not particularly hard to solve, but it does require a fair amount of work. You can get started by getting a list of all the current spellings and standardize them -- probably in an Excel spreadsheet.

If you do that, you can then create a lookup table and change the values by looking them up.

However, in the medium term, you should be creating a Competitor table and modelling the data in the way that your application needs.

Upvotes: 1

Related Questions