Reputation: 862
I have this dataframe, consisting in 73 rows:
Date Col1 Col2 Col3
1975 float float float
1976 float float float
1976 float float float
1977 float float float
1978 float float float
....
....
There are certain years appearing twice because the values were taken twice that year. What I want to do is to merge those rows where the year is the same, taking the mean value of each column for those specific two rows. The fact is that I am still familiarizing with pandas and I don't really understand the usage of the loc and iloc selectors. This is what I have tried, but I am sure this is completely wrong and non-pythonic:
for i in range(72):
if df.Date[i]==df.Date[i+1]:
df.Very_satisfied[i]= (df.Very_satisfied[i]+df.Very_satisfied[i+1])/2
df.Fairly_satisfied[i]= (df.Fairly_satisfied[i]+df.Fairly_satisfied[i+1])/2
df.NV_satisfied[i]= (df.NV_satisfied[i]+ df.NV_satisfied[i+1])/2
df.Not_satisfied[i]= (df.Not_satisfied[i]+ df.Not_satisfied[i+1])/2
df.DK[i]= (df.DK[i]+ df.DK[i+1])/2
a=i+1
str(a)
df.drop(a)
where "very satisfied", "fairly satisfied" ecc. are the columns. The point in my code is: if two years are the same calculate the mean of each value, substitute it in the first row and delete the second row. I really need something smarter and more elegant.
Upvotes: 0
Views: 1225
Reputation: 1080
You can use groupby()
and then mean()
for this. Here is an example :
import pandas as pd
import numpy as np
df = pd.DataFrame({'date': list(range(25)) * 2, 'col1': np.random.random(50) * 100, 'col2': np.random.random(50)})
df.groupby('date').mean()
This will take all the rows which the same date, calculate the mean value of all the rows in the group for each column.
This outputs on my sample :
df.groupby('date').mean().head()
col1 col2
date
0 42.881950 0.436073
1 32.114299 0.309742
2 96.819446 0.809071
3 30.606661 0.284257
4 40.690211 0.624972
For this input :
df[df['date'] < 5]
date col1 col2
0 0 67.268605 0.393560
1 1 55.864578 0.508636
2 2 97.735942 0.861162
3 3 58.014599 0.117055
4 4 7.429489 0.637101
25 0 18.495296 0.478585
26 1 8.364020 0.110848
27 2 95.902950 0.756980
28 3 3.198724 0.451460
29 4 73.950932 0.612843
Upvotes: 1