Reputation: 139
Let's assume I have a pandas DataFrame in Python which shows the name of the business unit leader for different units over time. It could look something like this and can be recreated like:
import pandas as pd
import numpy as np
d = pd.DataFrame({'Boss_January': ['Nina', 'Lena', 'Max', np.NaN], 'Boss_February': ['Nina', 'Emilia','Max','Leonie'],'Boss_March':['Nina','Lena','Mark','Leonie']})
I would like to find the units where there has been a change in the business unit leader, i.e. rows where the values across the different columns aren't all the same.
The result should include Unit 1, Unit 2 and Unit 3 because of changes in the unit leader (and missing value), but exclude Unit 0.
Since the real DataFrame has much more columns, I don't want to check all the values by iterating through the rows and checking if Boss_January==Boss_February==Boss_March.
Upvotes: 2
Views: 3838
Reputation: 164623
You can check for equality with the first series, test all values are True
, then take the negative:
res = df[~df.eq(df.iloc[:, 0], axis=0).all(1)]
print(res)
Boss_February Boss_January Boss_March
1 Emilia Lena Lena
2 Max Max Mark
3 Leonie NaN Leonie
If you just need indices, subsetting the dataframe is not required:
bools = ~df.eq(df.iloc[:, 0], axis=0).all(1)
idx = bools[bools].index
print(idx)
Int64Index([1, 2, 3], dtype='int64')
Upvotes: 2
Reputation: 7410
You can use apply
with axis=1
to aggregate the columns and require that the there is more than one unique
element in all the columns like:
d[d.apply(lambda x: len(x.unique())!=1, axis=1)]
Upvotes: 1