D.prd
D.prd

Reputation: 675

what is the simplest way to check for occurrence of character/substring in dataframe values?

consider a pandas dataframe that has values such as 'a - b'. I would like to check for the occurrence of '-' anywhere across all values of the dataframe without looping through individual columns. Clearly a check such as the following won't work:

if '-' in df.values

Any suggestions on how to check for this? Thanks.

Upvotes: 1

Views: 110

Answers (4)

piRSquared
piRSquared

Reputation: 294278

You can use replace to to swap a regex match with something else then check for equality

df.replace('.*-.*', True, regex=True).eq(True)

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

I'd use stack() + .str.contains() in this case:

In [10]: df
Out[10]:
   a      b      c
0  1  a - b      w
1  2      c      z
2  3      d  2 - 3

In [11]: df.stack().str.contains('-').any()
Out[11]: True

In [12]: df.stack().str.contains('-')
Out[12]:
0  a      NaN
   b     True
   c    False
1  a      NaN
   b    False
   c    False
2  a      NaN
   b    False
   c     True
dtype: object

Upvotes: 1

Ken Wei
Ken Wei

Reputation: 3130

Using NumPy: np.core.defchararray.find(a,s) returns an array of indices where the substring s appears in a; if it's not present, -1 is returned.

(np.core.defchararray.find(df.values.astype(str),'-') > -1).any()

returns True if '-' is present anywhere in df.

Upvotes: 0

niraj
niraj

Reputation: 18208

One way may be to try using flatten to values and list comprehension.

df = pd.DataFrame([['val1','a-b', 'val3'],['val4','3', 'val5']],columns=['col1','col2', 'col3'])
print(df)

Output:

   col1   col2    col3
0  val1    a-b    val3
1  val4    3      val5

Now, to search for -:

find_value = [val for val in df.values.flatten() if '-' in val]
print(find_value)

Output:

['a-b']

Upvotes: 0

Related Questions