Reputation: 105
I'm still learning Python thus I'm requiring some helps. I have the following data:
Product | No_unit_tested | Yield
A |1 |0.320
A |4 |0.780
B |5 |0.900
C |3 |0.670
C |7 |0.540
D |7 |1.000
D |9 |0.800
and I want to produce the following results:
Product |No_unit_tested |Yield |Mean
A |1 |0.320 |0.550
A |4 |0.780 |0.550
B |5 |0.900 |0.900
C |3 |0.670 |0.605
C |7 |0.540 |0.605
D |7 |1.000 |0.900
D |9 |0.800 |0.900
by using df = df.groupby('Product')['Yield'].mean()
I manage to get the mean for every product but I'm not able to produce the results that I want. How can I do it in Python using pandas?
Upvotes: 0
Views: 46
Reputation: 27577
Here is what you can do:
s1 = '''Product | No_unit_tested | Yield
A |1 |0.320
A |4 |0.780
B |5 |0.900
C |3 |0.670
C |7 |0.540
D |7 |1.000
D |9 |0.800'''
d = {}
s2 = [n.strip() for n in s1.replace('|','\n').split()]
for n in range(5,len(s2),3):
if s2[n-2] in d.keys():
d[s2[n-2]].append(float(s2[n]))
else:
d[s2[n-2]] = [float(s2[n])]
s3 = [s1.split('\n\n')[0]+' |Mean']
for k in d.keys():
for l in s1.split('\n'):
if k in l:
s3.append(l+f' |{"%.3f"%float(sum(d[k])/len(d[k]))}')
print('\n\n'.join(s3))
Output:
Product| No_unit_tested| Yield |Mean
A |1 |0.320 |0.550
A |4 |0.780 |0.550
B |5 |0.900 |0.900
C |3 |0.670 |0.605
C |7 |0.540 |0.605
D |7 |1.000 |0.900
D |9 |0.800 |0.900
Upvotes: 0
Reputation: 5012
Here you go:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO(
"""Product|No_unit_tested|Yield
A|1|0.320
A|4|0.780
B|5|0.900
C|3|0.670
C|7|0.540
D|7|1.000
D|9|0.800"""
), sep='|')
means = df.groupby('Product')['Yield'].mean()
means.name = 'Mean'
result = df.set_index('Product').join(means).reset_index()
print(result)
Output:
Product No_unit_tested Yield Mean
0 A 1 0.32 0.550
1 A 4 0.78 0.550
2 B 5 0.90 0.900
3 C 3 0.67 0.605
4 C 7 0.54 0.605
5 D 7 1.00 0.900
6 D 9 0.80 0.900
Upvotes: 1