Reputation: 35
I am new to the programming world and have a little problem with my project which I created in the jupyter notebook. First of all: I installed pandas, tensorflow and numphy and imported a data set. Then I had the list printed out with the help of pandas (see picture)
.
Now I want to determine the respective mean from the 'Votes' column (from the strings) and then insert this into the column instead of the strings. I've already tried everything, but unfortunately I can't find the solution.
I hope someone of you can help me :)
Upvotes: 2
Views: 866
Reputation: 34076
Take below dataframe
for example:
In [2210]: df = pd.DataFrame({'A':[[1,2, 3,2],[2,4,4,2],[3,1,3]], 'B':[1.03, 1.04, 1.05]})
In [2204]: df
Out[2204]:
A B
0 [1, 2, 3, 2] 1.03
1 [2, 4, 4, 2] 1.04
2 [3, 1, 3] 1.05
You can do:
In [2213]: import statistics
In [2211]: df['A'] = df['A'].apply(lambda x: statistics.mean(x))
In [2212]: df
Out[2212]:
A B
0 2.00 1.03
1 3.00 1.04
2 2.33 1.05
Upvotes: 1
Reputation: 30589
You have a string of a list in each cell. First you need to convert that string to a list. You can use eval
for it. For reasons explained here it's better however to use literal_eval
. From this list of numbers you than calculate the mean using numpy
's mean
. All this you apply
to the column:
import numpy as np
import ast
dataset.Votes = dataset.Votes.apply(lambda x: np.mean(ast.literal_eval(x)))
Upvotes: 1