Reputation: 77
I tried to extract keywords from an existing column to become values in a new column so I can do some groupby operation for further analysis. I've done some researches here and there but I cannot find the solutions yet.
The current dataframe looks like below with say 10K lines, and I've already convert all entries under col Finding Title to lowercase:
Entity | Finding Title |
---|---|
Singapore | usb port blocking not implemented |
UK | servers using outdated windows server os version |
My expected output:
Entity | Finding Title | Key |
---|---|---|
Singapore | usb port blocking not implemented | usb port |
UK | servers using outdated windows server os version | outdated windows |
My code is like below:
key_word = ['data protection agreement', 'disaster recovery plan', 'hard drive encryption', 'inappropriate access','network',
'rule management', 'backup', 'fire', 'password', 'server room',
'outdated window', 'usb', 'user policy']
df['Key'] = df['Finding Title'].str.extractall(key_word)
The it return the error "TypeError: unhashable type: 'list'"
Appreciate your suggestions as always. Thank you and stay safe.
Upvotes: 1
Views: 342
Reputation: 157
Maybe you can try this :
df['Key'] = df['Finding Title'].str.findall('|'.join(key_word))
But this will give you a list as the output. Something like [usb], [outdated window]. I don't know if there is a better way, but you can get string output by joining it again. By adding this code below.
df["Key"]= df["Key"].str.join(", ")
So it is something like this :
df['Key'] = df['Finding Title'].str.findall('|'.join(key_word))
df["Key"]= df["Key"].str.join(", ")
By joining it using comma, you also can anticipated if there are 2 keywords or more (USB, outdated window).
Upvotes: 2
Reputation: 323226
Try with
df['Key'] = df['Finding Title'].str.findall('|'.join(key_word))
Upvotes: 1