Reputation: 21
I have a data frame with one text column. I have to get keys of matching values as a new column. With the provided code below I'm getting just one key and it is going to next row without giving the second key. Please see a sample code that I have tried below. Any help would be appreciated.
Dict_new = { 'key1': ['orange', 'yellow', 'blue'],
'key2': ['red', 'saffron', 'purple'],
'key3': ['white', 'grey', 'black']}
column of the data frame :
white beard and purple hairs.
orange coloured car with black tilted windows.
eyes are red and grey hair.
I have got output as:
key3,
key1,
key2.
I'm getting only first key and I'm unable to go through further to get the second key.
Here is my code that I tried.
def new_code(x):
for keys, values in dict_new.items():
for value in values:
if value in x:
return keys
df2['new_code'] = df1['column'].apply(new_code)
what I'm expecting as output:
new_code:
key3 key2,
key1 key3,
key2 key3.
Any help would be highly appreciated.
Upvotes: 2
Views: 2106
Reputation: 13387
Try this:
One caveat- you have to have text divided only by space
, otherwise you either get rid of punctuation all together (which I do here, by replace
, since you only have dots in your example), before doing anything, or you use re.split()
.
import pandas as pd
_data={'txt': ["white beard and purple hairs.", "orange coloured car with black tilted windows.","eyes are red and grey hair."]}
df=pd.DataFrame(data=_data)
Dict_new = { 'key1': ['orange', 'yellow', 'blue'], 'key2': ['red', 'saffron',
'purple'], 'key3': ['white', 'grey', 'black']}
df['new_code']=df['txt'].apply(lambda x: ' '.join([k for k in Dict_new.keys() if len(set(x.replace('.', '').split() ).intersection(set(Dict_new[k])) )>0 ]))
print(df)
output:
txt new_code
0 white beard and purple hairs. key2 key3
1 orange coloured car with black tilted windows. key1 key3
2 eyes are red and grey hair. key2 key3
Upvotes: 1