vehk
vehk

Reputation: 107

Best way to fix data inconsistency

If you have a column in your table that contains inconsistent data for destinations, for example some may be "New York", some "New York, USA", "NY", etc. for different records, and there are already thousands of records, what is the best way to normalize them?

Upvotes: -2

Views: 231

Answers (3)

Michael Durrant
Michael Durrant

Reputation: 96584

Figure out the root cause of the problem you are trying to solve.

Inconsistent data is annoying but not necessarily a problem, for example the post office may still deliver it.

So ask yourself: What is the actual problem here? The inconsistent addresses can be a cause but what problem do they actually cause?

Once you have a good answer to this you'll then know which approaches to pursue in fixing it.

Upvotes: 0

NickW
NickW

Reputation: 9798

Best solution is to fix your data at source, if possible, by ensuring that data can only be entered consistently.

If you have a large number of columns to fix and the data in them is constantly changing then there is no good solution to fixing them downstream. If there is a limited number of columns and a limited number of variations then build a reference table based on the data you have and use that to drive your update statements:

INPUT VALUE CLEANSED VALUE
New York New York
New York, USA New York
NY New York

Upvotes: 2

murathanklc
murathanklc

Reputation: 9

You can add another option for select country or city like amazon ebay etc. there is city option for this problem. In the other side the string manupulation will take long time anyway. If you need string manupulation search keys and change names from IDE etc.

Search New York than change all to NY. ide can do this just 1 click. But this is not sustainable. You can use AI model IF you trust ofcourse.

Upvotes: 0

Related Questions