Reputation: 7034
I have a dataframe with arrays. Example:
df = pd.DataFrame([('a', [1,2,3]), ('b', [4,5,6])], columns=['name', 'values'])
name values
0 a [1, 2, 3]
1 b [4, 5, 6]
I know that the values
col arrays are of the same length.
I want to calculate the average on axis=0 of the values arrays.
In numpy I could do it like:
np.array([[1,2,3], [4,5,6]]).mean(axis=0) # result: array([2.5, 3.5, 4.5])
Is it possible with plain pandas?
If not, how can I easily convert the values col to a numpy array?
I've tried df['values'].values
, but this does not gives a matrix:
array([list([1, 2, 3]), list([4, 5, 6])], dtype=object)
Upvotes: 5
Views: 3690
Reputation: 323316
Here is one way
pd.DataFrame(df['values'].tolist()).mean()
Out[336]:
0 2.5
1 3.5
2 4.5
dtype: float64
Upvotes: 0
Reputation: 2868
use pandas series.tolist for converting pandas series to list
np.array(df['values'].tolist()).mean(axis = 0)
#o/p
array([2.5, 3.5, 4.5])
Upvotes: 5
Reputation: 2905
If you have only one column you want to work on, You can use apply
on the relevant column. apply
on a pd.Series
(e.g. column) works per element.
For example:
df = pd.DataFrame([('a', [1,2,3]), ('b', [4,5,6])], columns=['name', 'values'])
df['values_mean'] = df['values'].apply(lambda x: np.mean(x, axis=0))
df
Yields:
name values values_mean
0 a [1, 2, 3] 2.0
1 b [4, 5, 6] 5.0
If you have more than one column, the applymap
function works on a pd.DataFrame
per element (instead of apply
on a dataframe, which works per column). For example:
df = pd.DataFrame([('a', [1,2,3]), ('b', [4,5,6])], columns=['name', 'values'])
df[['values']].applymap(lambda x: np.mean(x, axis=0))
Yields:
values
0 2.0
1 5.0
Upvotes: 0