garchukins
garchukins

Reputation: 53

How to convert array column to int array in Pandas?

I have a DataFrame grouped_reps that contains names and a certain array of numbers associated with those names.

The dataframe is basically like:

grouped_reps = pd.DataFrame({
    'A': ['John', 'Mary', 'Tom'],
    'util_rate': [[1.0, 0.75, 0.90], [1.0, 0.80, 0.87],
                  [0.74, 0.34, 0.90, 0.45]]
})

Both the columns are currently object data types.

I'm trying to take the mean of each array associated with a name and store it in a new column in the dataframe, but to do this I have to convert the array to an float array first. I'm trying to do this by:

grouped_reps["util_rate"] = grouped_reps["util_rate"].astype(str).astype(float)

But I get this Error:

ValueError: could not convert string to float: '[1.0, 0.75, 0.9]'

Upvotes: 2

Views: 1968

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35626

To get the mean of each list, explode the list into multiple rows, convert to float via astype then calculate the mean on level=0:

grouped_reps['mean'] = (
    grouped_reps['util_rate'].explode().astype(float).mean(level=0)
)

grouped_reps:

      A                util_rate      mean
0  John         [1.0, 0.75, 0.9]  0.883333
1  Mary         [1.0, 0.8, 0.87]  0.890000
2   Tom  [0.74, 0.34, 0.9, 0.45]  0.607500

Explanation:

Explode produces a series where each element is in its own row:

grouped_reps['util_rate'].explode()
0     1.0
0    0.75
0     0.9
1     1.0
1     0.8
1    0.87
2    0.74
2    0.34
2     0.9
2    0.45
Name: util_rate, dtype: object

Convert to float:

grouped_reps['util_rate'].explode().astype(float)
0    1.00
0    0.75
0    0.90
1    1.00
1    0.80
1    0.87
2    0.74
2    0.34
2    0.90
2    0.45
Name: util_rate, dtype: float64

Since the index aligns with the index from each row, we can take the mean relative to level=0:

grouped_reps['util_rate'].explode().astype(float).mean(level=0)
0    0.883333
1    0.890000
2    0.607500
Name: util_rate, dtype: float64

Upvotes: 2

Related Questions