Reputation: 3236
Following code works and display the desired results. I like to select values for the SOURCE column only from available list in the reverse order, if there are multiple values.
import pandas as pd
available = ['a','b']
df = pd.DataFrame.from_dict({'SOURCE': ['x-a', 'b-y-z', 'c'] })
for entry in df['SOURCE']:
if not '-' in entry: continue
for col in entry.split("-")[::-1]:
if col in available:
df.loc[ df['SOURCE'] == entry,'SOURCE'] = col
break
print(df)
Output:
SOURCE
0 a
1 b
2 c
Is there a more Pythonic way to do it?
Update: Characters are just place Holder for strings in actual problem. If I don't find match in available list, it should return the original value.
Upvotes: 1
Views: 71
Reputation: 150735
You can use str.extract
:
pat = '|'.join(available[::-1])
df['SOURCE'] = df.SOURCE.str.extract(f'({pat})').fillna(df['SOURCE'])
Output:
SOURCE
0 a
1 b
2 c
Upvotes: 1