1cjtc jj
1cjtc jj

Reputation: 77

extract keyword from a column to a new column

I tried to extract keywords from an existing column to become values in a new column so I can do some groupby operation for further analysis. I've done some researches here and there but I cannot find the solutions yet.

The current dataframe looks like below with say 10K lines, and I've already convert all entries under col Finding Title to lowercase:

Entity Finding Title
Singapore usb port blocking not implemented
UK servers using outdated windows server os version

My expected output:

Entity Finding Title Key
Singapore usb port blocking not implemented usb port
UK servers using outdated windows server os version outdated windows

My code is like below:

key_word = ['data protection agreement', 'disaster recovery plan', 'hard drive encryption', 'inappropriate access','network', 
        'rule management', 'backup', 'fire', 'password', 'server room', 
        'outdated window', 'usb', 'user policy']

df['Key'] = df['Finding Title'].str.extractall(key_word)

The it return the error "TypeError: unhashable type: 'list'"

Appreciate your suggestions as always. Thank you and stay safe.

Upvotes: 1

Views: 342

Answers (2)

Haze
Haze

Reputation: 157

Maybe you can try this :

df['Key'] = df['Finding Title'].str.findall('|'.join(key_word))

But this will give you a list as the output. Something like [usb], [outdated window]. I don't know if there is a better way, but you can get string output by joining it again. By adding this code below.

df["Key"]= df["Key"].str.join(", ")

So it is something like this :

df['Key'] = df['Finding Title'].str.findall('|'.join(key_word))
df["Key"]= df["Key"].str.join(", ")

By joining it using comma, you also can anticipated if there are 2 keywords or more (USB, outdated window).

Upvotes: 2

BENY
BENY

Reputation: 323226

Try with

df['Key'] = df['Finding Title'].str.findall('|'.join(key_word))

Upvotes: 1

Related Questions