Pandas DataFrame- Iterate and compare with before values

Question

import pandas as pd

data = {'Unique_ID':  [44,66,12,76,57,83],
        'Location 1': ['AA', 'BB','CC','DD','EE','FF'],
        'Location 2': ['FF', 'CC','AA','EE','BB','CC'],
        'Rank':[1,2,3,4,5,6]
    }
pd.DataFrame (data)

Here is the sample dataset which should look like as shown below

Expected Output:

It is sorted based on the rank. I am looking for a final dataframe(df2) with a best 'location 1' and 'location2' combination based on the rank. Example:

Iteration-1: As this is the best so there is nothing to compare with so we straight away insert first record in the new dataframe (df2).
Iteration-2: Compare Location1 and Location2 if any of this exists in new dataframe(d2) of lcoation1 and location2 then do not insert else insert In this example we need to insert rank 2 records as "BB" and "CC" doesnt exists before.
Iteration-3:* Check if "CC" or "AA" exists in new data frame(d2). If either one or both presents do not update it into new dataframe. In this case both of them exits hence do not update this record into new dataframe.
Iteration-4: Check if DD or EE is present in new dataframe. We dont see both of them in new dataframe hence update this record into new dataframe

And iterating it through each record .........

Thanks in Advance

Shubham Sharma · Accepted Answer

Use:

m1 = df[['Location 1', 'Location 2']].stack().duplicated()
m2 = ~m1.any(level=0)
df = df[m2]

Details:

Use DataFrame.stack to reshape the dataframe with columns Location 1 and Location 2 to Multilevel series and use Series.duplicated to create a mask m1.

print(m1)
0  Location 1    False
   Location 2    False
1  Location 1    False
   Location 2    False
2  Location 1     True
   Location 2     True
3  Location 1    False
   Location 2    False
4  Location 1     True
   Location 2     True
5  Location 1     True
   Location 2     True
dtype: bool

Use Series.any on level=0 on the mask m1 and negate this to create a new boolean mask m2. Thanks to @ScottBoston for suggestion.

print(m2)
0     True
1     True
2    False
3     True
4    False
5    False
dtype: bool

Finally using this mask m2 filter the rows in the dataframe:

print(df)
   Unique_ID Location 1 Location 2  Rank
0         44         AA         FF     1
1         66         BB         CC     2
3         76         DD         EE     4

Pandas DataFrame- Iterate and compare with before values

Answers (1)

Related Questions