Reputation: 340
I have a following dataframe.(Actual columns in the dataframe are more than 30)
ID col1 col2 col3 col4
1 a## b## HO HO
2 c## d23 c## HO
3 a## k## z## s#
4 c## b12 a12 c12
5 b## HO z## HO
I want to make a new dataframe filtering rows that have all the strings ending with '##' and if one of the cells in the row contains 'HO',I want to skip it and return a row if the rest of the columns contain string ending with ##
In the above case new dataframe would be like this:
ID col1 col2 col3 col4
1 a## b## HO HO
3 a## k## z## s#
5 b## HO z## HO
currently I am doing
m = np.logical_and.reduce([
[x.endswith('##') for x in df[c] ] for c in df.columns if x!='HO'] )
But df[m] is only giving one following row and not the other two
3 a## k## z##
How can I fix this?
Upvotes: 2
Views: 3392
Reputation: 2239
You can try this:
import pandas as pd
import numpy as np
data = {'col1':['a##', 'c##', 'a##', 'c##', 'b##'], 'col2':['b##', 'd23', 'k##', 'b12', 'HO'], 'col3': ['HO', 'c##', 'z##', 'a12', 'z##']}
df = pd.DataFrame(data = data)
m = np.logical_and.reduce([[('##' in x) | (x=='HO') for x in df[c] ]for c in df.columns ])
print(df, '\n\n',df[m])
col1 col2 col3
0 a## b## HO
1 c## d23 c##
2 a## k## z##
3 c## b12 a12
4 b## HO z##
col1 col2 col3
0 a## b## HO
2 a## k## z##
4 b## HO z##
Upvotes: 0
Reputation: 402814
You can replace "HO" with a dummy string '##', store this result as a temporary DataFrame, compute the mask on the temp, and use that to index back into df
.
# df = df.set_index('ID') # Run this if not already done.
df
col1 col2 col3
ID
1 a## b## HO
2 c## d23 c##
3 a## k## z##
4 c## b12 a12
5 b## HO z##
u = df.replace('HO', '##')
m = np.logical_and.reduce([
[x.endswith('##') for x in u[c]] for c in u.columns]
)
df[m]
col1 col2 col3
ID
1 a## b## HO
3 a## k## z##
5 b## HO z##
Upvotes: 1
Reputation: 164773
You can combine Boolean filters as below. Note are only 2 valid scenarios: either you have 2 strings ending in '##'
and one equal to 'HO'
, or you have all 3 strings ending in '##'
.
suffix = df.iloc[:, 1:].apply(lambda s: s.str[-2:].eq('##')).sum(1)
value = df.iloc[:, 1:].eq('HO').sum(1)
res = df[(suffix.eq(2) & value.eq(1)) | suffix.eq(3)]
print(res)
ID col1 col2 col3
0 1 a## b## HO
2 3 a## k## z##
4 5 b## HO z##
Upvotes: 1