Reputation: 473
import pandas as pd
data = {'Unique_ID': [44,66,12,76,57,83],
'Location 1': ['AA', 'BB','CC','DD','EE','FF'],
'Location 2': ['FF', 'CC','AA','EE','BB','CC'],
'Rank':[1,2,3,4,5,6]
}
pd.DataFrame (data)
Here is the sample dataset which should look like as shown below
Expected Output:
It is sorted based on the rank. I am looking for a final dataframe(df2) with a best 'location 1' and 'location2' combination based on the rank. Example:
Iteration-1: As this is the best so there is nothing to compare with so we straight away insert first record in the new dataframe (df2).
Iteration-2: Compare Location1 and Location2 if any of this exists in new dataframe(d2) of lcoation1 and location2 then do not insert else insert In this example we need to insert rank 2 records as "BB" and "CC" doesnt exists before.
Iteration-3:* Check if "CC" or "AA" exists in new data frame(d2). If either one or both presents do not update it into new dataframe. In this case both of them exits hence do not update this record into new dataframe.
Iteration-4: Check if DD or EE is present in new dataframe. We dont see both of them in new dataframe hence update this record into new dataframe
And iterating it through each record .........
Thanks in Advance
Upvotes: 1
Views: 119
Reputation: 71689
Use:
m1 = df[['Location 1', 'Location 2']].stack().duplicated()
m2 = ~m1.any(level=0)
df = df[m2]
Details:
Use DataFrame.stack
to reshape the dataframe with columns Location 1
and Location 2
to Multilevel
series and use Series.duplicated
to create a mask m1
.
print(m1)
0 Location 1 False
Location 2 False
1 Location 1 False
Location 2 False
2 Location 1 True
Location 2 True
3 Location 1 False
Location 2 False
4 Location 1 True
Location 2 True
5 Location 1 True
Location 2 True
dtype: bool
Use Series.any
on level=0
on the mask m1
and negate this to create a new boolean mask m2
. Thanks to @ScottBoston for suggestion.
print(m2)
0 True
1 True
2 False
3 True
4 False
5 False
dtype: bool
Finally using this mask m2
filter the rows in the dataframe:
print(df)
Unique_ID Location 1 Location 2 Rank
0 44 AA FF 1
1 66 BB CC 2
3 76 DD EE 4
Upvotes: 4