Reputation: 517
I have a dataframe as follows:
<A> "B" C _:D <E>
A B "C" <D> E>
<A> "B" "C" D <E>
I am trying to find a way that will check which elements begin with '<' or '"' or '_:' and return a dataframe as follows:
1 1 0 1 1
0 0 1 1 0
1 1 1 0 1
Without using apply, due to size of the dataframe. Ideally my final dataframe becomes as follows:
<A> "B" C _:D <E> 4
A B "C" <D> E> 2
<A> "B" "C" D <E> 4
Thank you
Upvotes: 2
Views: 1184
Reputation: 210912
UPDATE:
how to add to the original dataframe a column containing the sum of 1s found in the stack + unstack ?
In [59]: df['new'] = df.stack().str.contains(r'^(?:\"|<|_:)').astype(np.uint8).sum(level=0)
In [60]: df
Out[60]:
0 1 2 3 4 new
0 <A> "B" C _:D <E> 4
1 A B "C" <D> E> 2
2 A< B" C" D E< 0 # pay attention at this row
Old answer:
try this:
df.apply(lambda col: col.str.contains(r'^\"|<|_:').astype(np.uint8))
Demo:
In [33]: df.apply(lambda col: col.str.contains(r'^\"|<|_:').astype(np.uint8))
Out[33]:
0 1 2 3 4
0 1 1 0 1 1
1 0 0 1 1 0
2 1 1 1 0 1
Or using stack()
+ unstack()
:
In [36]: df.stack().str.contains(r'^\"|<|_:').astype(np.uint8).unstack()
Out[36]:
0 1 2 3 4
0 1 1 0 1 1
1 0 0 1 1 0
2 1 1 1 0 1
Upvotes: 7