Pandas Groupby with Lambda and Algorithm

Question

Given this data frame:

import pandas as pd
import jenkspy
f = pd.DataFrame({'BreakGroup':['A','A','A','A','A','A','B','B','B','B','B'],
                 'Final':[1,2,3,4,5,6,10,20,30,40,50]})
    BreakGroup  Final
0         A     1
1         A     2
2         A     3
3         A     4
4         A     5
5         A     6
6         B     10
7         B     20
8         B     30
9         B     40
10        B     50

I'd like to use jenkspy to identify the group, based on natural breaks for 4 groups (classes), to which each value in "Final" within the group "BreakGroup" belongs.

I started out by doing this:

jenks=lambda x: jenkspy.jenks_breaks(f['Final'].tolist(),nb_class=4)
f['Group']=f.groupby(['BreakGroup'])['BreakGroup'].transform(jenks)

...which results in:

BreakGroup
A    [1.0, 10.0, 20.0, 30.0, 50.0]
B    [1.0, 10.0, 20.0, 30.0, 50.0]
Name: BreakGroup, dtype: object

The first problem here, as you may well have surmised, is that it applies the lambda function to the whole column of "Final" scores instead of just those belonging to each group in the Groupby. The second problem is that I need a column designating the correct group (class) membership, presumably by using transform instead of apply.

I then tried this:

jenks=lambda x: jenkspy.jenks_breaks(f['Final'].loc[f['BreakGroup']==x].tolist(),nb_class=4)
f['Group']=f.groupby(['BreakGroup'])['BreakGroup'].transform(jenks)

...but was promptly beaten back into submission:

ValueError: Can only compare identically-labeled Series objects

Update:

Here is the desired result. The "Result" column contains the upper limit of the group for the respective value from "Final" per group "BreakGroup":

    BreakGroup  Final   Result
0             A     1   2
1             A     2   3
2             A     3   4
3             A     4   4
4             A     5   6
5             A     6   6
6             B     10  20
7             B     20  30
8             B     30  40
9             B     40  50
10            B     50  50

Thanks in advance!

My slightly modified application based on accepted solution:

f.sort_values('BreakGroup',inplace=True)
f.reset_index(drop=True,inplace=True)
jenks = lambda x: jenkspy.jenks_breaks(x['Final'].tolist(),nb_class=4)
g = f.set_index('BreakGroup')
g['Groups'] = f.groupby(['BreakGroup']).apply(jenks)
g.reset_index(inplace=True)
groups= lambda x: [gp for gp in x['Groups']]
#'final' value should be > lower and <= upper
upper = lambda x: [gp for gp in x['Groups'] if gp >= x['Final']][0] # or gp == max(x['Groups'])
lower= lambda x: [gp for gp in x['Groups'] if gp < x['Final'] or gp == min(x['Groups'])][-1]
GroupIndex= lambda x: [x['Groups'].index(gp) for gp in x['Groups'] if gp < x['Final'] or gp == min(x['Groups'])][-1]
f['Groups']=g.apply(groups, axis=1)
f['Upper'] = g.apply(upper, axis=1)
f['Lower'] = g.apply(lower, axis=1)
f['Group'] = g.apply(GroupIndex, axis=1)
f['Group']=f['Group']+1

This returns:

The list of group boundaries
The upper boundary pertinent to the value for "Final"
The lower boundary pertinent to the value for "Final"
The group to which the value for "Final" will belong based on logic noted in comments.

EFT · Accepted Answer

You have jenks defined as a constant in terms of x, your lambda variable, so it doesn't depend on what you feed it with apply or transform. Changing the definition of jenks to

jenks = lambda x: jenkspy.jenks_breaks(x['Final'].tolist(),nb_class=4)

gives

In [315]: f.groupby(['BreakGroup']).apply(jenks)
Out[315]: 
BreakGroup
A         [1.0, 2.0, 3.0, 4.0, 6.0]
B    [10.0, 20.0, 30.0, 40.0, 50.0]
dtype: object

Continuing from this redefinition,

g = f.set_index('BreakGroup')
g['Groups'] = f.groupby(['BreakGroup']).apply(jenks)
g.reset_index(inplace=True)
group = lambda x: [gp for gp in x['Groups'] if gp > x['Final'] or gp == max(x['Groups'])][0]
f['Result'] = g.apply(group, axis=1)

gives

In [323]: f
Out[323]: 
   BreakGroup  Final  Result
0           A      1     2.0
1           A      2     3.0
2           A      3     4.0
3           A      4     6.0
4           A      5     6.0
5           A      6     6.0
6           B     10    20.0
7           B     20    30.0
8           B     30    40.0
9           B     40    50.0
10          B     50    50.0

Pandas Groupby with Lambda and Algorithm

Answers (2)

Related Questions