Reputation: 7536
Given this data frame:
import pandas as pd
import jenkspy
f = pd.DataFrame({'BreakGroup':['A','A','A','A','A','A','B','B','B','B','B'],
'Final':[1,2,3,4,5,6,10,20,30,40,50]})
BreakGroup Final
0 A 1
1 A 2
2 A 3
3 A 4
4 A 5
5 A 6
6 B 10
7 B 20
8 B 30
9 B 40
10 B 50
I'd like to use jenkspy to identify the group, based on natural breaks for 4 groups (classes), to which each value in "Final" within the group "BreakGroup" belongs.
I started out by doing this:
jenks=lambda x: jenkspy.jenks_breaks(f['Final'].tolist(),nb_class=4)
f['Group']=f.groupby(['BreakGroup'])['BreakGroup'].transform(jenks)
...which results in:
BreakGroup
A [1.0, 10.0, 20.0, 30.0, 50.0]
B [1.0, 10.0, 20.0, 30.0, 50.0]
Name: BreakGroup, dtype: object
The first problem here, as you may well have surmised, is that it applies the lambda function to the whole column of "Final" scores instead of just those belonging to each group in the Groupby. The second problem is that I need a column designating the correct group (class) membership, presumably by using transform instead of apply.
I then tried this:
jenks=lambda x: jenkspy.jenks_breaks(f['Final'].loc[f['BreakGroup']==x].tolist(),nb_class=4)
f['Group']=f.groupby(['BreakGroup'])['BreakGroup'].transform(jenks)
...but was promptly beaten back into submission:
ValueError: Can only compare identically-labeled Series objects
Update:
Here is the desired result. The "Result" column contains the upper limit of the group for the respective value from "Final" per group "BreakGroup":
BreakGroup Final Result
0 A 1 2
1 A 2 3
2 A 3 4
3 A 4 4
4 A 5 6
5 A 6 6
6 B 10 20
7 B 20 30
8 B 30 40
9 B 40 50
10 B 50 50
Thanks in advance!
My slightly modified application based on accepted solution:
f.sort_values('BreakGroup',inplace=True)
f.reset_index(drop=True,inplace=True)
jenks = lambda x: jenkspy.jenks_breaks(x['Final'].tolist(),nb_class=4)
g = f.set_index('BreakGroup')
g['Groups'] = f.groupby(['BreakGroup']).apply(jenks)
g.reset_index(inplace=True)
groups= lambda x: [gp for gp in x['Groups']]
#'final' value should be > lower and <= upper
upper = lambda x: [gp for gp in x['Groups'] if gp >= x['Final']][0] # or gp == max(x['Groups'])
lower= lambda x: [gp for gp in x['Groups'] if gp < x['Final'] or gp == min(x['Groups'])][-1]
GroupIndex= lambda x: [x['Groups'].index(gp) for gp in x['Groups'] if gp < x['Final'] or gp == min(x['Groups'])][-1]
f['Groups']=g.apply(groups, axis=1)
f['Upper'] = g.apply(upper, axis=1)
f['Lower'] = g.apply(lower, axis=1)
f['Group'] = g.apply(GroupIndex, axis=1)
f['Group']=f['Group']+1
This returns:
The list of group boundaries
The upper boundary pertinent to the value for "Final"
The lower boundary pertinent to the value for "Final"
The group to which the value for "Final" will belong based on logic noted in comments.
Upvotes: 3
Views: 1740
Reputation: 2369
You have jenks
defined as a constant in terms of x
, your lambda variable, so it doesn't depend on what you feed it with apply
or transform
. Changing the definition of jenks
to
jenks = lambda x: jenkspy.jenks_breaks(x['Final'].tolist(),nb_class=4)
gives
In [315]: f.groupby(['BreakGroup']).apply(jenks)
Out[315]:
BreakGroup
A [1.0, 2.0, 3.0, 4.0, 6.0]
B [10.0, 20.0, 30.0, 40.0, 50.0]
dtype: object
Continuing from this redefinition,
g = f.set_index('BreakGroup')
g['Groups'] = f.groupby(['BreakGroup']).apply(jenks)
g.reset_index(inplace=True)
group = lambda x: [gp for gp in x['Groups'] if gp > x['Final'] or gp == max(x['Groups'])][0]
f['Result'] = g.apply(group, axis=1)
gives
In [323]: f
Out[323]:
BreakGroup Final Result
0 A 1 2.0
1 A 2 3.0
2 A 3 4.0
3 A 4 6.0
4 A 5 6.0
5 A 6 6.0
6 B 10 20.0
7 B 20 30.0
8 B 30 40.0
9 B 40 50.0
10 B 50 50.0
Upvotes: 3
Reputation: 107652
Currently, you are passing a series into transform()
and not scalar as you intend for the filter condition. Consider indexing for the first value such as x.index[0]
as all values are the same in a groupby
series. You can even run min(x)
or max(x)
:
lambda x: jenkspy.jenks_breaks(f['Final'].loc[f['BreakGroup']==x.index[0]].tolist(), nb_class=4)
f['Group'] = f.groupby(['BreakGroup'])['BreakGroup'].transform(jenks)
Upvotes: 1