Reputation: 1839
What is the correct way to search every cell of a dataframe and see if that cell contains a value that is in a list of keywords? The example below is short... the real datafarame could have any number of columns/rows and contain nulls. I know it's not correct, but a starting point is here:
import pandas as pd
myKeywords = ['apple', 'banana', 'orange']
myData = [['apple',10],['coconut',12],['donut',13],['I love apples',13]]
myDf = pd.DataFrame(myData,columns=['colOne','colN'],dtype=float)
print myDf
def findAll(keywordList, df):
return df[(df.values.ravel() in keywordList).reshape(df.shape).any(1)]
result = findAll(myKeys, myDf)
print result
# I would expect it to only print the values 'apple' and 'I love apples'
Upvotes: 1
Views: 442
Reputation: 13393
I use df.values.ravel().astype(str)
to get all the values from all the cells as a regular list, then I filter it based on any
to see if a keyword is a substring of some value.
try this:
import pandas as pd
myKeywords = ['apple', 'banana', 'orange']
myData = [['apple',10],['coconut',12],['donut',13],['I love apples',13]]
myDf = pd.DataFrame(myData,columns=['colOne','colN'],dtype=float)
def findAll(keywordList, df):
return [value for value in df.values.ravel().astype(str) if any(word in value for word in keywordList)]
result = findAll(myKeywords, myDf)
print(result)
Output:
['apple', 'I love apples']
Upvotes: 1