Reputation: 1222
I am applying multiple filters on a dataframe at the same time.
data_df[(data_df['1']!=0) & (data_df['2']==0) & (data_df['3']==0) & (data_df['4']==0) & (data_df['5']==0)]
I needed to know is there any optimized way to do this? As I want to compare one column's value as !=0 and others value as =0 multiple times and columns could be more than 5. So, all operations will be:
data_df[(data_df['1']==0) & (data_df['2']!=0) & (data_df['3']==0) & (data_df['4']==0) & (data_df['5']==0)]
data_df[(data_df['1']==0) & (data_df['2']==0) & (data_df['3']!=0) & (data_df['4']==0) & (data_df['5']==0)]
data_df[(data_df['1']==0) & (data_df['2']==0) & (data_df['3']==0) & (data_df['4']!=0) & (data_df['5']==0)]
data_df[(data_df['1']==0) & (data_df['2']==0) & (data_df['3']==0) & (data_df['4']==0) & (data_df['5']!=0)]
Looking for a short and optimized method.
Upvotes: 1
Views: 1103
Reputation: 863301
One idea is compare by numpy array filled by 1
and 0
and test if all values matching by numpy.all
:
#test list - all 0, first 1
L = [1,0,0,0,0]
df = data_df[np.all(data_df == np.array(L), axis=1)]
Or use DataFrame.merge
by one row DataFrame
:
df = data_df.merge(pd.DataFrame([L], columns=data_df.columns))
Sample:
np.random.seed(2020)
data_df = pd.DataFrame(np.random.randint(2, size=(100, 5)), columns=list('12345'))
#print (data_df)
df = data_df[np.all(data_df == np.array(L), axis=1)]
print (df)
1 2 3 4 5
2 1 0 0 0 0
13 1 0 0 0 0
44 1 0 0 0 0
58 1 0 0 0 0
70 1 0 0 0 0
89 1 0 0 0 0
Or:
L = [1,0,0,0,0]
df = data_df.merge(pd.DataFrame([L], columns=data_df.columns))
print (df)
1 2 3 4 5
0 1 0 0 0 0
1 1 0 0 0 0
2 1 0 0 0 0
3 1 0 0 0 0
4 1 0 0 0 0
5 1 0 0 0 0
Solution with merge
should be used with helper DataFrame
with all combinations:
df1 = pd.DataFrame(0, index=data_df.columns, columns=data_df.columns)
np.fill_diagonal(df1.to_numpy(), 1)
print (df1)
1 2 3 4 5
1 1 0 0 0 0
2 0 1 0 0 0
3 0 0 1 0 0
4 0 0 0 1 0
5 0 0 0 0 1
df = data_df.merge(df1.loc[['1']])
print (df)
1 2 3 4 5
0 1 0 0 0 0
1 1 0 0 0 0
2 1 0 0 0 0
3 1 0 0 0 0
4 1 0 0 0 0
5 1 0 0 0 0
df = data_df.merge(df1.loc[['2']])
print (df)
1 2 3 4 5
0 0 1 0 0 0
1 0 1 0 0 0
Upvotes: 2
Reputation: 75120
Based on the below statements:
Looking for a short and optimized method
and
I want to compare one column's value as !=0 and others value as =0
You can use df.ne
and df.eq
with df.drop
on axis=1
to drop the column 1
:
data_df[data_df[1].ne(0) & data_df.drop(1,axis=1).eq(0).all(axis=1)]
Upvotes: 2
Reputation: 477318
We can first calculate the dataframe, such that for the given columns we only calculate once if the value is equal to zero or not.
df_bool = df[['1', '2', '3', '4', '5']] == 0
Next we can use this as a mask:
df[~df_bool[1] & df[[2, 3, 4, 5]].all(axis=1)]
Upvotes: 2