Reputation: 105
let's say I have four columns with strings in each column (pandas df). If I want to compare if they are all the same, I came up with something like this:
df['same_FB'] = np.where( (df['FB_a'] == df['FB_b']) & (df['FB_a'] == df['FB_c']) & (df['FB_a'] == df['FB_d']), 1,0)
It works fine, but it doesn't look good and if I had to add a fifth or sixth column it get's even uglier. Is there another way to test if all columns are the same? Alternatively, I would be ok with counting the distinct values in these four columns.
Upvotes: 1
Views: 172
Reputation: 71707
You can use DataFrame.eq
+ DataFrame.all
:
x,*y = ['FB_a', 'FB_b', 'Fb_c', 'FB_d']
df['same_FB'] = df[y].eq(df[x], axis=0).all(1).view('i1')
Alternatively you can use nunique
:
c = ['FB_a', 'FB_b', 'Fb_c', 'FB_d']
df['same_FB'] = df[c].nunique(axis=1, dropna=False).eq(1).view('i1')
Example:
print(df)
A B C D E
0 10 1 1 1 1
1 20 2 2 2 2
2 30 3 3 3 3
3 40 4 4 4 4
x,*y = ['B', 'C', 'D', 'E']
df['same'] = df[y].eq(df[x], axis=0).all(1).view('i1')
print(df)
A B C D E same
0 10 1 1 1 1 1
1 20 2 2 2 2 1
2 30 3 3 3 3 1
3 40 4 4 4 4 1
Upvotes: 2
Reputation: 2859
You can use chained python logic. Here is the code:
df['same_FB'] = np.where((df['FB_a'] == df['FB_b'] == df['FB_c'] == df['FB_d']), 1,0)
Upvotes: 1