Reputation: 7410
I have a Python dictionary as follows:
ref_dict = {
"Company1" :["C1_Dev1","C1_Dev2","C1_Dev3","C1_Dev4","C1_Dev5",],
"Company2" :["C2_Dev1","C2_Dev2","C2_Dev3","C2_Dev4","C2_Dev5",],
"Company3" :["C3_Dev1","C3_Dev2","C3_Dev3","C3_Dev4","C3_Dev5",],
}
I have a Pandas data frame called df whose one of the columns looks like this:
DESC_DETAIL
0 Probably task Company2 C2_Dev5
1 File system C3_Dev1
2 Weather subcutaneous Company2
3 Company1 Travesty C1_Dev3
4 Does not match anything
...........
My goal is to add two extra columns to this data frame and name the columns, COMPANY and DEVICE. The value in each row of the COMPANY column will be either be the company key in the dictionary if it exists in the DESC_DETAIL column or if the corresponding device exists in the DESC_DETAIL column. The value in the DEVICE column will simply be the device string in the DESC_DETAIL column. If no match is found, the corresponding row is empty. Hence the final output will look like this:
DESC_DETAIL COMPANY DEVICE
0 Probably task Company2 C2_Dev5 Company2 C2_Dev5
1 File system C3_Dev1 Company3 C3_Dev1
2 Weather subcutaneous Company2 Company2 NaN
3 Company1 Travesty C1_Dev3 Company1 C1_Dev3
4 Does not match anything NaN NaN
My attempt:
for key, value in ref_dict.items():
df['COMPANY'] = df.apply(lambda row: key if row['DESC_DETAIL'].isin(key) else Nan, axis=1)
This is obviously just wrong and does not work. How do I make it work ?
Upvotes: 0
Views: 68
Reputation: 9941
You can extract values with str.extract
using a regex pattern:
import re
s = pd.Series(ref_dict).explode()
# extract company
df['COMPANY'] = df['DESC_DETAIL'].str.extract(
f"({'|'.join(s.index.unique())})", flags=re.IGNORECASE)
# extract device
df['DEVICE'] = df['DESC_DETAIL'].str.extract(
f"({'|'.join(s)})", flags=re.IGNORECASE)
# fill missing company values based on device
df['COMPANY'] = df['COMPANY'].fillna(
df['DEVICE'].str.lower().map(dict(zip(s.str.lower(), s.index))))
df
Output:
DESC_DETAIL COMPANY DEVICE
0 Probably task Company2 C2_Dev5 Company2 C2_Dev5
1 File system C3_Dev1 Company3 C3_Dev1
2 Weather subcutaneous Company2 Company2 NaN
3 Company1 Travesty C1_Dev3 Company1 C1_Dev3
4 Does not match anything NaN NaN
Upvotes: 1
Reputation: 14238
You need a device to company dictionary as well and you can build it from the ref_dict
easily as below:
dev_to_company_dict = {v:l[0] for l in zip(ref_dict.keys(), ref_dict.values()) for v in l[1]}
Then it is easy to do this:
df['COMPANY'] = df['DESC_DETAIL'].apply(lambda det : ''.join(set(re.split("\\s+", det)).intersection(ref_dict.keys())))
df['COMPANY'].replace('', np.nan, inplace=True)
df['DEVICE'] = df['DESC_DETAIL'].apply(lambda det : ''.join(set(re.split("\\s+", det)).intersection(dev_to_company_dict.keys())))
df['DEVICE'].replace('', np.nan, inplace=True)
df['COMPANY'] = df['COMPANY'].fillna(df['DEVICE'].map(dev_to_company_dict))
Output:
DESC_DETAIL COMPANY DEVICE
0 Probably task Company2 C2_Dev5 Company2 C2_Dev5
1 File system C3_Dev1 Company3 C3_Dev1
2 Weather subcutaneous Company2 Company2 NaN
3 Company1 Travesty C1_Dev3 Company1 C1_Dev3
4 Does not match anything NaN NaN
Upvotes: 0