Stacey
Stacey

Reputation: 5107

Adding more than one grouping of a dataframe

I have a df :

That looks like:

    A  Wave Dept  Ratio
1   1     1    A     10
2   4     1    A      6
3   7     2    A      3
4   2     2    A      4
5   6     2    B      9
6   5     2    B      7
7   5     1    B     12
8   5     1    B      4
9   5     2    C      9
10  5     1    C      3
11  5     1    C      4
12  5     2    C      7

I am currently grouping the data by Dept and checking each row Ratio against all other ratios with the same department and incrementing a count if the ratio value is greater than the threshold.

The code for this looks like:

    import pandas as pd
    lowerThreshold=1.2
    
    def main():
        print()
        df = pd.DataFrame([
                [1, 1, 'A', 10], 
                [4, 1 ,'A', 6], 
                [7, 2 ,'A', 3],
                [2, 2 ,'A', 4],
                [6, 2 ,'B', 9],
                [5, 2 ,'B', 7],
                [5, 1 ,'B', 12],
                [5, 1 ,'B', 4],
                [5, 2 ,'C', 9],
                [5, 1 ,'C', 3],
                [5, 1 ,'C', 4],
                [5, 2 ,'C', 7]
                ],
             index=['1', '2', '3','4','5','6','7','8','9','10','11','12'],
             columns=['A', 'Wave', 'Dept', 'Ratio'])
        print('df input:')
        print(df)
        
    df['Output']=df.groupby('Dept')['Ratio'].transform(lambda dept: compareLower(dept.values,lowerThreshold))
    df[df['Wave'] <= 3]
    
    print()
    print('df - output')
    print(df)

def compareLower(dept,lowerThreshold):
    print('x:',dept)
    #print('y:',y)
    return (dept[:,None] < lowerThreshold*dept).sum(-1) 

The output for this looks like:

    A  Wave Dept  Ratio  Output
1   1     1    A     10       1
2   4     1    A      6       2
3   7     2    A      3       4
4   2     2    A      4       3
5   6     2    B      9       2
6   5     2    B      7       3
7   5     1    B     12       1
8   5     1    B      4       4
9   5     2    C      9       1
10  5     1    C      3       4
11  5     1    C      4       3
12  5     2    C      7       2

I would now like to add an additional grouping that also groups on Wave. I would like to only include in the group if the Wave values of the data set are equal to or less than the record Wave. E.g. if the records Wave is equal to 1 and the Dept is equal A then the grouping will only include records that have a Wave equal to or less than 1 and a Dept value equal to A. If the Wave is equal to 2 and the Dept is equal to A then the grouping would include any record that have the a Dept value equal to A and a Wave equal to or less than 2 (so including the records with a Wave value of 1 as well as 2).

So the revised output would look like:

    A  Wave Dept  Ratio  Output
1   1     1    A     10       0
2   4     1    A      6       1
3   7     2    A      3       3
4   2     2    A      4       2
5   6     2    B      9       1
6   5     2    B      7       2
7   5     1    B     12       0
8   5     1    B      4       1
9   5     2    C      9       0
10  5     1    C      3       1
11  5     1    C      4       0
12  5     2    C      7       1

I have tried to add the new grouping:

df['Output']=df.groupby('Dept','Wave<=??')['Wave','Ratio'].transform(lambda x: compareLower(x.values,lowerThreshold))

But I don't know how to make the Wave dynamic (as each row has a different Wave value). There is also the possibility of more than 2 Waves.

Upvotes: 2

Views: 41

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150785

Let's try your logic here by filling the Outputcolumns with the rankings when we remove the Wave one by one:

waves = sorted(set(df['Wave']))[::-1]

df['Output'] = np.nan

for thresh in waves:
    df['Output'] = (df[df['Wave'].le(thresh)].groupby('Dept')
                       .Ratio.rank(ascending=False).sub(1)
                       .reindex(df.index)
                       .fillna(df['Output'])
                    )

Output:

    A  Wave Dept  Ratio  Output
1   1     1    A     10     0.0
2   4     1    A      6     1.0
3   7     2    A      3     3.0
4   2     2    A      4     2.0
5   6     2    B      9     1.0
6   5     2    B      7     2.0
7   5     1    B     12     0.0
8   5     1    B      4     1.0
9   5     2    C      9     0.0
10  5     1    C      3     1.0
11  5     1    C      4     0.0
12  5     2    C      7     1.0

Upvotes: 1

Related Questions