Reputation: 5444
I have got three different .csv files which contain the grades for students in three different assignment. I would like to read them with pandas and calculate the average for each student. The template for each file is:
Student id, Mark, extra fields, ...
4358975489, 9, ... ...
2345234523, 10, ... ...
7634565323, 7, ... ...
7653563366, 7, ... ...
... ..., ... ...
For the second assignment:
Student id, Mark, extra fields, ...
4358975489, 6, ... ...
2345234523, 8, ... ...
7634565323, 4, ... ...
7653563366, 5, ... ...
... ..., ... ...
Desired output for the two doc for instance:
Student id, average, extra fields, ...
4358975489, 7.5, ... ...
2345234523, 9, ... ...
7634565323, 5.5, ... ...
7653563366, 6, ... ...
... ..., ... ...
the same for the last doc. I want to read these docs separately and for each student id to average the Mark.
Now, my code for reading one file is the following:
i_df1 = pandas.read_csv('first.csv')
i_df2 = pandas.read_csv('second.csv')
i_df3 = pandas.read_csv('third.csv')
print (o_df.keys())
for i, row in i_df1.iterrows():
pdb.set_trace()
How can I process all three files simultaneously and extract the average grade?
Upvotes: 0
Views: 187
Reputation: 34046
If you want to process all dfs
together, you can do this:
df = df1.append([df2, df3]).groupby('Student id', as_index=False).mean()
OR:
If you want to do it simultaneously, you can use list comprehension
with df.append
and mean
:
Below is the list
of your dfs
:
In [1220]: df_list = [i_df1, i_df2, i_df3]
You can simultaneously find average of each student in a file and store the output in another list
:
In [1223]: df = [i.groupby('Student_id', as_index=False).mean() for i in df_list]
Upvotes: 1
Reputation: 142
using the data you gave assuming they get same marks three times
import pandas as pd
import numpy as np
data = [
[4358975489, 9],
[2345234523, 10],
[7634565323, 7]]
data = np.array(data)
data = pd.DataFrame(data, columns=["student", "mark"])
data1 = pd.DataFrame(data, columns=["student", "mark"])
data2 = pd.DataFrame(data, columns=["student", "mark"])
std_maks = pd.concat([data, data1, data2]).groupby('student')
print(std_maks['mark'].mean())
Upvotes: 1
Reputation: 7164
You could also use the filenames and directly concatenate the dataframes together:
fnames = ["first.csv", "second.csv", "third.csv"]
df = pd.concat(pd.read_csv(fname) for fname in fnames)
df.groupby("Student id")["Mark"].mean()
Probably practically irrelevant, but perhaps nice to know anyway: This approach doesn't load the data into your memory twice, but only once.
Upvotes: 1
Reputation: 2583
Use pd.concat to concat 3 df:
i_df1 = pandas.read_csv('first.csv')
i_df2 = pandas.read_csv('second.csv')
i_df3 = pandas.read_csv('third.csv')
df = pd.concat([i_df1, i_df2, i_df3])
df.groupby('Student id').agg({'Mark':'mean'})
Upvotes: 1