Kelaref
Kelaref

Reputation: 517

Pandas StartsWith multiple options

I have a dataframe as follows:

 <A>   "B"    C    _:D   <E>
  A     B    "C"    <D>   E>
 <A>   "B"   "C"     D   <E>

I am trying to find a way that will check which elements begin with '<' or '"' or '_:' and return a dataframe as follows:

  1     1     0     1     1
  0     0     1     1     0
  1     1     1     0     1

Without using apply, due to size of the dataframe. Ideally my final dataframe becomes as follows:

 <A>   "B"    C    _:D   <E>    4
  A     B    "C"    <D>   E>    2
 <A>   "B"   "C"     D   <E>    4

Thank you

Upvotes: 2

Views: 1184

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210912

UPDATE:

how to add to the original dataframe a column containing the sum of 1s found in the stack + unstack ?

In [59]: df['new'] = df.stack().str.contains(r'^(?:\"|<|_:)').astype(np.uint8).sum(level=0)

In [60]: df
Out[60]:
     0    1    2    3    4  new
0  <A>  "B"    C  _:D  <E>    4
1    A    B  "C"  <D>   E>    2
2   A<   B"   C"    D   E<    0  # pay attention at this row

Old answer:

try this:

df.apply(lambda col: col.str.contains(r'^\"|<|_:').astype(np.uint8))

Demo:

In [33]: df.apply(lambda col: col.str.contains(r'^\"|<|_:').astype(np.uint8))
Out[33]:
   0  1  2  3  4
0  1  1  0  1  1
1  0  0  1  1  0
2  1  1  1  0  1

Or using stack() + unstack():

In [36]: df.stack().str.contains(r'^\"|<|_:').astype(np.uint8).unstack()
Out[36]:
   0  1  2  3  4
0  1  1  0  1  1
1  0  0  1  1  0
2  1  1  1  0  1

Upvotes: 7

Related Questions