Reputation: 107
If you have a column in your table that contains inconsistent data for destinations, for example some may be "New York", some "New York, USA", "NY", etc. for different records, and there are already thousands of records, what is the best way to normalize them?
Upvotes: -2
Views: 231
Reputation: 96584
Inconsistent data is annoying but not necessarily a problem, for example the post office may still deliver it.
So ask yourself: What is the actual problem here? The inconsistent addresses can be a cause but what problem do they actually cause?
Once you have a good answer to this you'll then know which approaches to pursue in fixing it.
Upvotes: 0
Reputation: 9798
Best solution is to fix your data at source, if possible, by ensuring that data can only be entered consistently.
If you have a large number of columns to fix and the data in them is constantly changing then there is no good solution to fixing them downstream. If there is a limited number of columns and a limited number of variations then build a reference table based on the data you have and use that to drive your update statements:
INPUT VALUE | CLEANSED VALUE |
---|---|
New York | New York |
New York, USA | New York |
NY | New York |
Upvotes: 2
Reputation: 9
You can add another option for select country or city like amazon ebay etc. there is city option for this problem. In the other side the string manupulation will take long time anyway. If you need string manupulation search keys and change names from IDE etc.
Search New York than change all to NY. ide can do this just 1 click. But this is not sustainable. You can use AI model IF you trust ofcourse.
Upvotes: 0