abrn
abrn

Reputation: 33

Search for dictionary key within DataFrame column and return dictionary value in a new column

I have a data frame containing transactions from a financial institution. One of the columns ['vendor_full'] is the vendor, but it could contain store numbers, physical location, etc. making it more difficult to summarize based who the vendor actually is.

I have created a dictionary where the key is the vendor name as it may appear in the data frame (or at least a part of the column string) and the value is the vendor name as I would like it to be written to a new column ['vendor_short'].

Based on this question and the answer from @Vaishali I got very close to the solution, but the difference is the user who posted the question above wanted to use the dictionary value as both the search term and the returned value. I would like to search for the key and return the value.

import pandas as pd

data = {'amount': [100, 150, 5, 89, 55, 14], 'vendor_full': ['store_name 1234', 'online_store xx55', 'st_name 9876', 'grocery_store', 'online_shop', 'clothing_store xx']}
cols = ['amount', 'vendor_full']

df = pd.DataFrame(data,columns = cols)

vendor_dict = {'store_name': 'store_name', 'online_store': 'online_store', 'st_name': 'store_name', 'grocery_store': 'grocery_store', 'online_shop': 'online_store', 'clothing_store': 'clothing_store'}

pat = r'({})'.format('|'.join(vendor_dict.values()))
cond = df['vendor_full'].str.contains('|'.join(vendor_dict.keys()))
df.loc[cond, 'vendor_short'] = df['vendor_full'].str.extract((pat), expand=False)

The code above seems to work for the first occurrence of a vendor, but I'm getting NaN for the remaining occurrences.

Actual:

    amount    vendor_full    vendor_short
0   100    store_name 1234   store_name
1   150    online_store xx55 online_store
2   5      st_name 9876      NaN
3   89     grocery_store     grocery_store
4   55     online_shop       NaN
5   14     clothing_store xx clothing_store

Expected/desired:

    amount  vendor_full       vendor_short
0   100     store_name 1234   store_name
1   150     online_store xx55 online_store
2   5       st_name 9876      store_name
3   89      grocery_store     grocery_store
4   55      online_shop       online_store
5   14      clothing_store xx clothing_store

Upvotes: 0

Views: 1864

Answers (1)

Erfan
Erfan

Reputation: 42946

Method 1

Firs we make dataframe out of your dict. Then we extract the names of your df so we can merge on these names and get the vendor_short:

df2 = pd.DataFrame({'vendor_full':list(vendor_dict.keys()),
                    'vendor_short':list(vendor_dict.values())})

s = df['vendor_full'].str.extract("({})".format('|'.join(df2['vendor_full'])))

df['vendor_short'] = s.merge(df2, left_on=0, right_on='vendor_full')['vendor_short']
   amount        vendor_full    vendor_short
0     100    store_name 1234      store_name
1     150  online_store xx55    online_store
2       5       st_name 9876      store_name
3      89      grocery_store   grocery_store
4      55        online_shop    online_store
5      14  clothing_store xx  clothing_store

Method 2

Using .map:

s = df['vendor_full'].str.extract("({})".format('|'.join(vendor_dict.keys())))
df['vendor_short'] = s[0].map(vendor_dict)
   amount        vendor_full    vendor_short
0     100    store_name 1234      store_name
1     150  online_store xx55    online_store
2       5       st_name 9876      store_name
3      89      grocery_store   grocery_store
4      55        online_shop    online_store
5      14  clothing_store xx  clothing_store

Method 3

Provided by cs95 in the comments

Using regex to extract the names from the vendor_full column and mapping them to the dict with .map:

df['vendor_short'] = df['vendor_full'].str.extract('([a-zA-Z_]+)', expand=False).map(vendor_dict)
   amount        vendor_full    vendor_short
0     100    store_name 1234      store_name
1     150  online_store xx55    online_store
2       5       st_name 9876      store_name
3      89      grocery_store   grocery_store
4      55        online_shop    online_store
5      14  clothing_store xx  clothing_store

Upvotes: 1

Related Questions