Pandas StartsWith multiple options

Question

I have a dataframe as follows:

    "B"    C    _:D   
  A     B    "C"       E>
    "B"   "C"     D

I am trying to find a way that will check which elements begin with '<' or '"' or '_:' and return a dataframe as follows:

  1     1     0     1     1
  0     0     1     1     0
  1     1     1     0     1

Without using apply, due to size of the dataframe. Ideally my final dataframe becomes as follows:

    "B"    C    _:D       4
  A     B    "C"       E>    2
    "B"   "C"     D       4

Thank you

MaxU - stand with Ukraine · Accepted Answer

UPDATE:

how to add to the original dataframe a column containing the sum of 1s found in the stack + unstack ?

In [59]: df['new'] = df.stack().str.contains(r'^(?:\"|<|_:)').astype(np.uint8).sum(level=0)

In [60]: df
Out[60]:
     0    1    2    3    4  new
0    "B"    C  _:D      4
1    A    B  "C"     E>    2
2   A<   B"   C"    D   E<    0  # pay attention at this row

Old answer:

try this:

df.apply(lambda col: col.str.contains(r'^\"|<|_:').astype(np.uint8))

Demo:

In [33]: df.apply(lambda col: col.str.contains(r'^\"|<|_:').astype(np.uint8))
Out[33]:
   0  1  2  3  4
0  1  1  0  1  1
1  0  0  1  1  0
2  1  1  1  0  1

Or using stack() + unstack():

In [36]: df.stack().str.contains(r'^\"|<|_:').astype(np.uint8).unstack()
Out[36]:
   0  1  2  3  4
0  1  1  0  1  1
1  0  0  1  1  0
2  1  1  1  0  1

Pandas StartsWith multiple options

Answers (1)

Related Questions