Rahul Agarwal
Rahul Agarwal

Reputation: 4100

Match values of multiple columns by using 2 columns

Sample DF:

ID   Name     Match1    Random_Col    Match2    Price    Match3     Match4       Match5
1    Apple      Yes     Random Value   No        10      Yes        Yes          Yes
2    Apple      Yes     Random Value1  No        10      Yes        Yes          No
3    Apple      Yes     Random Value2  No        15      No         Yes          Yes
4    Orange     No      Random Value   Yes       12      Yes        Yes          No
5    Orange     No      Random Value   Yes       12      No         No           No
6    Banana     Yes     Random Value   No        15      Yes        No           No
7    Apple      Yes     Random Value   No        15      No        Yes          Yes

Expected DF:

ID   Name     Match1    Random_Col    Match2  Price Match3  Match4 Match5 Final_Match
1    Apple      Yes     Random Value   No      10    Yes    Yes    Yes   Full
2    Apple      Yes     Random Value1  No      10    Yes    Yes    No  Partial
3    Apple      Yes     Random Value2  No      15    No     Yes    Yes Partial
4    Orange     No      Random Value   Yes     12    Yes    Yes    No    Full
5    Orange     No      Random Value   Yes     12    No     No     No Partial
6    Banana     Yes     Random Value   No      15    Yes    No     No   Full
7    Apple      Yes     Random Value   No      15    No     Yes    Yes Partial

Problem Statement:

  1. If combination Name and Price is non-repetitive simply put Full in Final_Match column (Example ID 6)
  2. If the combination Name and Price are repetitive then within them count Yes in Match1 to Match5 columns, whichever has greater "Yes" put Full for that one and Partial for the other (Example ID 1 & 2 and 4,5)

  3. If the combination Name and Price are repetitive then within an ID count Yes in Match1 to Match5 columns,if they have equal "Yes" put Partial in both (Example ID 3,7)

Code

s = (df.replace({'Yes': 1, 'No': 0})
     .iloc[:, 1:]
     .sum(1))

df['final_match'] = np.where(s.groupby(df[['Price','Name']]).rank(ascending=False).eq(1), 'Full ','Partial')

The above code works when I had to groupby by only 1 column lets say Name but it is not working for combination.

Any help!!

Upvotes: 1

Views: 70

Answers (1)

jezrael
jezrael

Reputation: 862511

Use:

#count Yes values only in Match columns
s = df.filter(like='Match').eq('Yes').sum(axis=1)
#mask for unique combinations
m1 = ~df.duplicated(['Price','Name'], keep=False)
#create new column filled by Series s
m2 = df.assign(new=s).groupby(['Price','Name'])['new'].rank(ascending=False).eq(1)
#chain masks by bitwise OR
df['final_match'] = np.where(m1 | m2, 'Full ','Partial')
print (df)

   ID    Name Match1     Random_Col Match2  Price Match3 Match4 Match5  \
0   1   Apple    Yes   Random Value     No     10    Yes    Yes    Yes   
1   2   Apple    Yes  Random Value1     No     10    Yes    Yes     No   
2   3   Apple    Yes  Random Value2     No     15     No    Yes    Yes   
3   4  Orange     No   Random Value    Yes     12    Yes    Yes     No   
4   5  Orange     No   Random Value    Yes     12     No     No     No   
5   6  Banana    Yes   Random Value     No     15    Yes     No     No   
6   7   Apple    Yes   Random Value     No     15     No    Yes    Yes   

  final_match  
0       Full   
1     Partial  
2     Partial  
3       Full   
4     Partial  
5       Full   
6     Partial  

Upvotes: 1

Related Questions