Reputation: 439
I can't really find any good solution to this from stackoverflow. I managed to get full word match based on iterating over the list, and assigning the value to a new column. I guess df.lookup could also be used in some way to achieve the same, as well as df.get_value and probably also a join...
This is the solution I got working first. This is for looking up full words, but what is the best way to lookup based on wildcard and take the first result? An example could be having a list of bank transactions, trying to match against names of stores to assign a category.
import pandas as pd
df = pd.read_csv("transactions.csv")
d = {
'SUBWAY': '9',
'TRANSFER TO': '5',
'Best Buy': '8'
}
for index, row in df.iterrows():
if(row['Description'] in d.keys()):
df.at[index, 'Category'] = d[row['Description']]
Upvotes: 2
Views: 928
Reputation: 164773
For partial matches of dictionary keys within df['Description']
, you can iterate your dictionary instead of your dataframe:
d = {'IKEA': '9', 'TRANSFER TO SAVINGS': '5', 'Best buy': '8'}
for k, v in d.items():
df.loc[df['Description'].str.contains(k), 'Category'] = v
Assuming you have a large number of rows relative to dictionary keys, this will be considerably more efficient than iterating rows.
Be careful with dictionary ordering. Pre v3.7, dictionaries should not be assumed to be ordered. In CPython v3.6 and v3.7+, dictionaries are insertion ordered. The above logic will ensure the last match sticks.
For exact matches, you can use pd.Series.map
followed by fillna
:
d = {'IKEA': '9', 'TRANSFER TO SAVINGS': '5', 'Best buy': '8'}
df['Category'] = df['Description'].map(d).fillna(df['Category'])
The fillna
is important as you don't want to overwrite existing values in 'Category'
with NaN
for instances where 'Description'
is not in d
.
A usually less efficient alternative which avoids fillna
is possible via pd.Series.replace
.
Upvotes: 5