Reputation: 5107
I have a df :
That looks like:
A Wave Dept Ratio
1 1 1 A 10
2 4 1 A 6
3 7 2 A 3
4 2 2 A 4
5 6 2 B 9
6 5 2 B 7
7 5 1 B 12
8 5 1 B 4
9 5 2 C 9
10 5 1 C 3
11 5 1 C 4
12 5 2 C 7
I am currently grouping the data by Dept
and checking each row Ratio
against all other ratios with the same department and incrementing a count if the ratio value is greater than the threshold.
The code for this looks like:
import pandas as pd
lowerThreshold=1.2
def main():
print()
df = pd.DataFrame([
[1, 1, 'A', 10],
[4, 1 ,'A', 6],
[7, 2 ,'A', 3],
[2, 2 ,'A', 4],
[6, 2 ,'B', 9],
[5, 2 ,'B', 7],
[5, 1 ,'B', 12],
[5, 1 ,'B', 4],
[5, 2 ,'C', 9],
[5, 1 ,'C', 3],
[5, 1 ,'C', 4],
[5, 2 ,'C', 7]
],
index=['1', '2', '3','4','5','6','7','8','9','10','11','12'],
columns=['A', 'Wave', 'Dept', 'Ratio'])
print('df input:')
print(df)
df['Output']=df.groupby('Dept')['Ratio'].transform(lambda dept: compareLower(dept.values,lowerThreshold))
df[df['Wave'] <= 3]
print()
print('df - output')
print(df)
def compareLower(dept,lowerThreshold):
print('x:',dept)
#print('y:',y)
return (dept[:,None] < lowerThreshold*dept).sum(-1)
The output for this looks like:
A Wave Dept Ratio Output
1 1 1 A 10 1
2 4 1 A 6 2
3 7 2 A 3 4
4 2 2 A 4 3
5 6 2 B 9 2
6 5 2 B 7 3
7 5 1 B 12 1
8 5 1 B 4 4
9 5 2 C 9 1
10 5 1 C 3 4
11 5 1 C 4 3
12 5 2 C 7 2
I would now like to add an additional grouping that also groups on Wave
. I would like to only include in the group if the Wave
values of the data set are equal to or less than the record Wave. E.g. if the records Wave is equal to 1 and the Dept is equal A then the grouping will only include records that have a Wave equal to or less than 1 and a Dept value equal to A. If the Wave is equal to 2 and the Dept is equal to A then the grouping would include any record that have the a Dept value equal to A and a Wave equal to or less than 2 (so including the records with a Wave value of 1 as well as 2).
So the revised output would look like:
A Wave Dept Ratio Output
1 1 1 A 10 0
2 4 1 A 6 1
3 7 2 A 3 3
4 2 2 A 4 2
5 6 2 B 9 1
6 5 2 B 7 2
7 5 1 B 12 0
8 5 1 B 4 1
9 5 2 C 9 0
10 5 1 C 3 1
11 5 1 C 4 0
12 5 2 C 7 1
I have tried to add the new grouping:
df['Output']=df.groupby('Dept','Wave<=??')['Wave','Ratio'].transform(lambda x: compareLower(x.values,lowerThreshold))
But I don't know how to make the Wave dynamic (as each row has a different Wave value). There is also the possibility of more than 2 Waves.
Upvotes: 2
Views: 41
Reputation: 150785
Let's try your logic here by filling the Output
columns with the rankings when we remove the Wave
one by one:
waves = sorted(set(df['Wave']))[::-1]
df['Output'] = np.nan
for thresh in waves:
df['Output'] = (df[df['Wave'].le(thresh)].groupby('Dept')
.Ratio.rank(ascending=False).sub(1)
.reindex(df.index)
.fillna(df['Output'])
)
Output:
A Wave Dept Ratio Output
1 1 1 A 10 0.0
2 4 1 A 6 1.0
3 7 2 A 3 3.0
4 2 2 A 4 2.0
5 6 2 B 9 1.0
6 5 2 B 7 2.0
7 5 1 B 12 0.0
8 5 1 B 4 1.0
9 5 2 C 9 0.0
10 5 1 C 3 1.0
11 5 1 C 4 0.0
12 5 2 C 7 1.0
Upvotes: 1