Reputation: 33
I have a dataframe with columns like this:
A B
0 - 5923FoxRd 5923 Fox Rd
1 631 Newhaven Ave Modesto
2 Saratoga Street, Suite 200 Saratoga Street, Suite 200
I want to create a list with values from A that matches values from B. The list should look like [- 5923FoxRd, Saratoga Street, Suite 200...]. What is the easiest way to do this?
Upvotes: 1
Views: 202
Reputation: 16673
To make a little go a long way, do the following:
\W+
to str.replace()
str.lower()
drive
to dr
, avenue
to ave
, etc.s1 = df['A'].str.replace('\W+', '').str.lower()
s2 = df['B'].str.replace('\W+', '').str.lower()
lst = [*df[s1==s2]['A']]
lst
Out[1]: ['- 5923FoxRd', 'Saratoga Street, Suite 200']
This is what s1
and s2
look like:
print(s1,s2)
0 5923foxrd
1 631newhavenave
2 saratogastreetsuite200
Name: A, dtype: object
0 5923foxrd
1 modesto
2 saratogastreetsuite200
Name: B, dtype: object
From there, you might want to create some replace values in order to normalize your data even further like:
to_replace = ['drive', 'avenue', 'street']
replaced = ['dr', 'ave', 'str']
to_replace = ['drive', 'avenue', 'street']
replaced = ['dr', 'ave', 'str']
s1 = df['A'].str.replace('\W+', '').str.lower().replace(to_replace, replaced, regex=True)
s2 = df['B'].str.replace('\W+', '').str.lower().replace(to_replace, replaced, regex=True)
lst = [*df[s1==s2]['A']]
lst
print(s1,s2)
0 5923foxrd
1 631newhavenave
2 saratogastrsuite200
Name: A, dtype: object
0 5923foxrd
1 modesto
2 saratogastrsuite200
Name: B, dtype: object
Upvotes: 1