anna
anna

Reputation: 35

Find the mean of a column in a data set using pandas and python

I am new to the programming world and have a little problem with my project which I created in the jupyter notebook. First of all: I installed pandas, tensorflow and numphy and imported a data set. Then I had the list printed out with the help of pandas (see picture)

the output table.

Now I want to determine the respective mean from the 'Votes' column (from the strings) and then insert this into the column instead of the strings. I've already tried everything, but unfortunately I can't find the solution.

I hope someone of you can help me :)

Upvotes: 2

Views: 866

Answers (2)

Mayank Porwal
Mayank Porwal

Reputation: 34076

Take below dataframe for example:

In [2210]: df = pd.DataFrame({'A':[[1,2, 3,2],[2,4,4,2],[3,1,3]], 'B':[1.03, 1.04, 1.05]})

In [2204]: df   
Out[2204]: 
              A    B
0  [1, 2, 3, 2] 1.03
1  [2, 4, 4, 2] 1.04
2     [3, 1, 3] 1.05

You can do:

In [2213]: import statistics 

In [2211]: df['A'] = df['A'].apply(lambda x: statistics.mean(x))

In [2212]: df 
Out[2212]: 
     A    B
0 2.00 1.03
1 3.00 1.04
2 2.33 1.05

Upvotes: 1

Stef
Stef

Reputation: 30589

You have a string of a list in each cell. First you need to convert that string to a list. You can use eval for it. For reasons explained here it's better however to use literal_eval. From this list of numbers you than calculate the mean using numpy's mean. All this you apply to the column:

import numpy as np
import ast
dataset.Votes = dataset.Votes.apply(lambda x: np.mean(ast.literal_eval(x)))

Upvotes: 1

Related Questions