Sidhartha
Sidhartha

Reputation: 1100

How to search a partial String in the whole dataframe using pandas?

How to search for a string value on each and every columns using pandas . Lets say I have 32 columns ,

df[df['A'].str.contains("hello")]

this returns whether the value is present in "A" column or not ,How to search on every columns and the row in which the value is exist . Dataset :

A           B           C
1           hi          hie
2           bye         Hello

If I search for "hello" or "Hello" output should be :

A           B            C
2           bye         Hello

Upvotes: 5

Views: 5628

Answers (3)

jedi
jedi

Reputation: 585

A way to check for existence of partial string, if you need entire string it will be faster than this:

df.stack().str.contains('search_string').sum()>0

However, to answer your question. Convert the DataFrame into Series by stacking all columns. Then search the Series for the string of interest. From the resulting matches, get the row index that is the same as it was in the DataFrame. Using the row index look up the rows from the DataFrame. I think this method is at least 20% faster than applymap method.

s = df.stack() # convert entire data frame into a series of values df.iloc[s[s.str.contains('search_string',na=False)].index.get_level_values(0)]

Upvotes: 0

jezrael
jezrael

Reputation: 862601

I think you can use:

df = pd.DataFrame({'A':['hello fgf','s','f'],'B':['d','ff hello','f'],'C':[4,7,8]})
print (df)
           A         B  C
0  hello fgf         d  4
1          s  ff hello  7
2          f         f  8

mask = df.applymap(lambda x: 'hello' in str(x))
print (mask)
       A      B      C
0   True  False  False
1  False   True  False
2  False  False  False

Then if need filter add any for check at least one True per row with boolean indexing:

df1 = df[mask.any(axis=1)]
print (df1)
           A         B  C
0  hello fgf         d  4
1          s  ff hello  7

EDIT:

tested = 'hello'
mask = df.applymap(lambda x:  tested.lower() in str(x).lower())
print (mask)
       A      B      C
0  False  False  False
1  False  False   True

Upvotes: 8

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

You can also concatenate all columns into one string and search for your substring in the concatenated string:

In [21]: df[df.astype(str).add('|').sum(1).str.contains('hello')]
Out[21]:
           A         B  C
0  hello fgf         d  4
1          s  ff hello  7

Explanation:

In [22]: df.astype(str).add('|').sum(1)
Out[22]:
0    hello fgf|d|4|
1     s|ff hello|7|
2            f|f|8|
dtype: object

Upvotes: 2

Related Questions