HKE
HKE

Reputation: 473

Pandas DataFrame- Iterate and compare with before values

import pandas as pd

data = {'Unique_ID':  [44,66,12,76,57,83],
        'Location 1': ['AA', 'BB','CC','DD','EE','FF'],
        'Location 2': ['FF', 'CC','AA','EE','BB','CC'],
        'Rank':[1,2,3,4,5,6]
    }
pd.DataFrame (data)

Here is the sample dataset which should look like as shown below

enter image description here

Expected Output:

enter image description here

It is sorted based on the rank. I am looking for a final dataframe(df2) with a best 'location 1' and 'location2' combination based on the rank. Example:

And iterating it through each record .........

Thanks in Advance

Upvotes: 1

Views: 119

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Use:

m1 = df[['Location 1', 'Location 2']].stack().duplicated()
m2 = ~m1.any(level=0)
df = df[m2]

Details:

Use DataFrame.stack to reshape the dataframe with columns Location 1 and Location 2 to Multilevel series and use Series.duplicated to create a mask m1.

print(m1)
0  Location 1    False
   Location 2    False
1  Location 1    False
   Location 2    False
2  Location 1     True
   Location 2     True
3  Location 1    False
   Location 2    False
4  Location 1     True
   Location 2     True
5  Location 1     True
   Location 2     True
dtype: bool

Use Series.any on level=0 on the mask m1 and negate this to create a new boolean mask m2. Thanks to @ScottBoston for suggestion.

print(m2)
0     True
1     True
2    False
3     True
4    False
5    False
dtype: bool

Finally using this mask m2 filter the rows in the dataframe:

print(df)
   Unique_ID Location 1 Location 2  Rank
0         44         AA         FF     1
1         66         BB         CC     2
3         76         DD         EE     4

Upvotes: 4

Related Questions