Reputation: 2033
I have a dataset which has feature 'abdomcirc' that has multiple values per ChildID, like so:
ChildID abdomcirc
0 1 273
1 1 267
2 1 294
3 2 136
4 2 248
I want to calculate the range of values for a given a list of abdomcirc values per child id. So I want to get these results:
ChildID range
0 1 27
1 2 112
So I first tried this:
df["range"] = df.groupby('ChildID')["mussabdomcirc"].transform('range')
But I got this error ValueError: 'range' is not a valid function name for transform(name)
So, as suggested in the answer to this question, I tried the following line:
df["range"] = df.groupby('ChildID').apply(lambda x: x.High.max() - x.Low.min())
But I got this error: AttributeError: 'DataFrame' object has no attribute 'High'
Not sure why I am getting this error. Any suggestion on how to successfully calculate the range of a group of values in a dataframe?
Upvotes: 0
Views: 732
Reputation: 1317
High
is not in df
, please change High
with your column
df.groupby("ChildID").apply(lambda x: x['abdomcirc'].max() - x['abdomcirc'].min())
Upvotes: 1
Reputation: 323226
There is one function from numpy.ptp
s=df.groupby('ChildID')['abdomcirc'].apply(np.ptp).to_frame('range').reset_index()
Out[75]:
ChildID range
0 1 27
1 2 112
Fix your code
df.groupby('ChildID').apply(lambda x: x.abdomcirc.max() - x.abdomcirc.min())
Upvotes: 2