ekjcfn3902039
ekjcfn3902039

Reputation: 1839

Get contents of every dataframe cell that matches a keyword

What is the correct way to search every cell of a dataframe and see if that cell contains a value that is in a list of keywords? The example below is short... the real datafarame could have any number of columns/rows and contain nulls. I know it's not correct, but a starting point is here:

import pandas as pd

myKeywords = ['apple', 'banana', 'orange']
myData = [['apple',10],['coconut',12],['donut',13],['I love apples',13]]
myDf = pd.DataFrame(myData,columns=['colOne','colN'],dtype=float)
print myDf

def findAll(keywordList, df):
  return df[(df.values.ravel() in keywordList).reshape(df.shape).any(1)]

result = findAll(myKeys, myDf)
print result

# I would expect it to only print the values 'apple' and 'I love apples'

Upvotes: 1

Views: 442

Answers (1)

Adam.Er8
Adam.Er8

Reputation: 13393

I use df.values.ravel().astype(str) to get all the values from all the cells as a regular list, then I filter it based on any to see if a keyword is a substring of some value.

try this:

import pandas as pd

myKeywords = ['apple', 'banana', 'orange']
myData = [['apple',10],['coconut',12],['donut',13],['I love apples',13]]
myDf = pd.DataFrame(myData,columns=['colOne','colN'],dtype=float)

def findAll(keywordList, df):
    return [value for value in df.values.ravel().astype(str) if any(word in value for word in keywordList)]

result = findAll(myKeywords, myDf)
print(result)

Output:

['apple', 'I love apples']

Upvotes: 1

Related Questions