Reputation: 53
I have a DataFrame grouped_reps
that contains names and a certain array of numbers associated with those names.
The dataframe is basically like:
grouped_reps = pd.DataFrame({
'A': ['John', 'Mary', 'Tom'],
'util_rate': [[1.0, 0.75, 0.90], [1.0, 0.80, 0.87],
[0.74, 0.34, 0.90, 0.45]]
})
Both the columns are currently object data types.
I'm trying to take the mean of each array associated with a name and store it in a new column in the dataframe, but to do this I have to convert the array to an float array first. I'm trying to do this by:
grouped_reps["util_rate"] = grouped_reps["util_rate"].astype(str).astype(float)
But I get this Error:
ValueError: could not convert string to float: '[1.0, 0.75, 0.9]'
Upvotes: 2
Views: 1968
Reputation: 35626
To get the mean of each list, explode
the list into multiple rows, convert to float
via astype
then calculate the mean
on level=0:
grouped_reps['mean'] = (
grouped_reps['util_rate'].explode().astype(float).mean(level=0)
)
grouped_reps
:
A util_rate mean
0 John [1.0, 0.75, 0.9] 0.883333
1 Mary [1.0, 0.8, 0.87] 0.890000
2 Tom [0.74, 0.34, 0.9, 0.45] 0.607500
Explanation:
Explode produces a series where each element is in its own row:
grouped_reps['util_rate'].explode()
0 1.0
0 0.75
0 0.9
1 1.0
1 0.8
1 0.87
2 0.74
2 0.34
2 0.9
2 0.45
Name: util_rate, dtype: object
Convert to float:
grouped_reps['util_rate'].explode().astype(float)
0 1.00
0 0.75
0 0.90
1 1.00
1 0.80
1 0.87
2 0.74
2 0.34
2 0.90
2 0.45
Name: util_rate, dtype: float64
Since the index aligns with the index from each row, we can take the mean relative to level=0
:
grouped_reps['util_rate'].explode().astype(float).mean(level=0)
0 0.883333
1 0.890000
2 0.607500
Name: util_rate, dtype: float64
Upvotes: 2