user3599803
user3599803

Reputation: 7034

pandas - how to work with arrays in cells

I have a dataframe with arrays. Example:

df = pd.DataFrame([('a', [1,2,3]), ('b', [4,5,6])], columns=['name', 'values'])

  name     values
0    a  [1, 2, 3]
1    b  [4, 5, 6]

I know that the values col arrays are of the same length. I want to calculate the average on axis=0 of the values arrays. In numpy I could do it like:

np.array([[1,2,3], [4,5,6]]).mean(axis=0) # result: array([2.5, 3.5, 4.5])

Is it possible with plain pandas?
If not, how can I easily convert the values col to a numpy array? I've tried df['values'].values, but this does not gives a matrix:

array([list([1, 2, 3]), list([4, 5, 6])], dtype=object)

Upvotes: 5

Views: 3690

Answers (3)

BENY
BENY

Reputation: 323316

Here is one way

pd.DataFrame(df['values'].tolist()).mean()
Out[336]: 
0    2.5
1    3.5
2    4.5
dtype: float64

Upvotes: 0

qaiser
qaiser

Reputation: 2868

use pandas series.tolist for converting pandas series to list

np.array(df['values'].tolist()).mean(axis = 0)

#o/p
array([2.5, 3.5, 4.5])

Upvotes: 5

Itamar Mushkin
Itamar Mushkin

Reputation: 2905

If you have only one column you want to work on, You can use apply on the relevant column. apply on a pd.Series (e.g. column) works per element. For example:

df = pd.DataFrame([('a', [1,2,3]), ('b', [4,5,6])], columns=['name', 'values'])
df['values_mean'] = df['values'].apply(lambda x: np.mean(x, axis=0))
df

Yields:

    name    values  values_mean
0   a   [1, 2, 3]   2.0
1   b   [4, 5, 6]   5.0

If you have more than one column, the applymap function works on a pd.DataFrame per element (instead of apply on a dataframe, which works per column). For example:

df = pd.DataFrame([('a', [1,2,3]), ('b', [4,5,6])], columns=['name', 'values'])
df[['values']].applymap(lambda x: np.mean(x, axis=0))

Yields:

    values
0   2.0
1   5.0

Upvotes: 0

Related Questions