Reputation: 321
Assume I have df1
:
df1= pd.DataFrame({'alligator_apple': range(1, 11),
'barbadine': range(11, 21),
'capulin_cherry': range(21, 31)})
alligator_apple barbadine capulin_cherry
0 1 11 21
1 2 12 22
2 3 13 23
3 4 14 24
4 5 15 25
5 6 16 26
6 7 17 27
7 8 18 28
8 9 19 29
9 10 20 30
And a df2
:
df2= pd.DataFrame({'alligator_apple': [6, 7, 15, 5],
'barbadine': [3, 19, 25, 12],
'capulin_cherry': [1, 9, 15, 27]})
alligator_apple barbadine capulin_cherry
0 6 3 1
1 7 19 9
2 15 25 15
3 5 12 27
I'm looking for a way to create a new column in df2
that gets number of rows based on a condition where all columns in df1
has values greater than their counterparts in df2
for each row. For example:
alligator_apple barbadine capulin_cherry greater
0 6 3 1 4
1 7 19 9 1
2 15 25 15 0
3 5 12 27 3
To elaborate, at row 0 of df2
, df1.alligator_apple
has 4 rows which values are higher than df2.alligator_apple
with the value of 6. df1.barbadine
has 10 rows which values are higher than df2.barbadine
with value of 3, while similarly df1.capulin_cherry
has 10 rows.
Finally, apply an 'and' condition to all aforementioned conditions to get the number '4' of df2.greater
of first row. Repeat for the rest of rows in df2
.
Is there a simple way to do this?
Upvotes: 2
Views: 1121
Reputation: 380
There's a general solution to this that should work well.
def gt_mask(row,df):
mask = True
for key,val in row.items():
mask &= df[key] > val
return len(df[mask])
df2['greater'] = df2.apply(gt_mask,df=df1,axis=1)
Output df2
,alligator_apple,barbadine,capulin_cherry,greater
0,6,3,1,4
1,7,19,9,1
2,15,25,15,0
3,5,12,27,3
This creates a mask, iterating through the key/val pairs for a given row.
Edit this answer was a big help: Masking a DataFrame on multiple column conditions - inside a loop
Upvotes: 2
Reputation: 7509
I believe this does what you want:
df2['greater'] = df2.apply(
lambda row:
(df1['alligator_apple'] > row['alligator_apple']) &
(df1['barbadine'] > row['barbadine']) &
(df1['capulin_cherry'] > row['capulin_cherry']),
axis=1,
).sum(axis=1)
print(df2)
output:
alligator_apple barbadine capulin_cherry greater
0 6 3 1 4
1 7 19 9 1
2 15 25 15 0
3 5 12 27 3
Edit: if you want to generalize and apply this logic for a given column set, we can use functools.reduce
together with operator.and_
:
import functools
import operator
columns = ['alligator_apple', 'barbadine', 'capulin_cherry']
df2['greater'] = df2.apply(
lambda row: functools.reduce(
operator.and_,
(df1[column] > row[column] for column in columns),
),
axis=1,
).sum(axis=1)
Upvotes: 2