Reputation: 62
I have a pandas df that has a list of item numbers and then a number next to it. I would like to somehow get the average of all the same item numbers and that number next to it.
Here is a part of the DataFrame:
Item ID Time
X32TR2639 7.142857
X32TR2639 7.142857
X36SL7708 16.714286
X36TA0029 16.714286
X36TR3016 16.714286
Desired output:
Item ID Average Time:
X32TR2639 7.142857
X36SL7708 16.714286
X36TA0029 16.714286
X36TR3016 16.714286
I would like for each item ID there is to have an average time however if there is more than one copy of that Item ID take the average of them all
This is only a small part of the dataframe. As you see the first two are the same. I would like to calculate the average of all of them. So if its the same use all those numbers and get that average. So the script would look for all of the item numbers X32TR2639
and get the number next to it and then get that average.
Upvotes: 2
Views: 941
Reputation:
I would propose a straightforward groupby.mean
and a reset_index
.
data = {"Item ID":['X32TR2639','X32TR2639','X36SL7708','X36TA0029','X36TR3016'],'time':[7.142857,7.142857,16.714286,16.714286,16.714286]}
df = pd.DataFrame(data)
df.groupby('Item ID').mean().reset_index()
Item ID time
0 X32TR2639 7.142857
1 X36SL7708 16.714286
2 X36TA0029 16.714286
3 X36TR3016 16.714286
I have tried with 50k of data and here's the time performance.
df
ID time
0 X32TR2639 0.837810
1 X32TR2639 0.855781
2 X36SL7708 0.322786
3 X36TA0029 0.441353
4 X36TR3016 0.254487
... ...
49995 X32TR2639 0.885251
49996 X32TR2639 0.315009
49997 X36SL7708 0.298589
49998 X36TA0029 0.229855
49999 X36TR3016 0.933437
[50000 rows x 2 columns]
%timeit df.groupby('ID').mean().reset_index()
4.76 ms ± 73.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
This is the output dataframe after doing the groupby.mean
on the 50k dataframe with duplicates.
df.groupby('ID').mean().reset_index()
ID time
0 X32TR2639 0.493729
1 X36SL7708 0.500936
2 X36TA0029 0.501064
3 X36TR3016 0.492773
Upvotes: 5