How can I use python to iterate through multiple csv files and if a value is the same, update another value?

Question

I'm trying to iterate through a few .csv files, and if a value is the same, add up the related entries. If a value is not the same, add that as a new row.

My current code is:

import pandas
s1011 = pandas.read_csv('goals/1011.csv', sep=',', usecols=(1,3,5))
s1011.dropna(how="all", inplace=True)
print s1011

s1112 = pandas.read_csv('goals/1112.csv', sep=',', usecols=(1,3,5))
s1112.dropna(how="all", inplace=True)
print s1112

s1213 = pandas.read_csv('goals/1213.csv', sep=',', usecols=(1,3,5))
s1213.dropna(how="all", inplace=True)
print s1213

Currently this doesn't do much I know. It prints out 3 headings, Team, For & Against. It the prints out 20 football teams, how many goals they've scored, and how many conceded. I've tried using merge in python but it isn't suitable as it just makes one big table.

What I'm trying to do is open the multiple csv files, iterate through the list, if the Team is the same, add together goals for and goals against from each file. If Team has not been entered before, add a new row with new entries.

Is this possible using python?

Edit:

It currently prints

Team For Against

Man Utd 86 43

Man City 83 45

Chelsea 89 39

etc for 20 teams. So I want to update the For and Against entries by adding up the number of goals for each team over a number of seasons. Since it won't be the same 20 teams each season, I want to add a new row of entries if a team hasn't been in the league before.

user3582076 · Accepted Answer

Assume you have the following csv's:

df1

Team For Against

Man Utd 86 43

Man City 83 45

Chelsea 89 39

df2

Team For Against

Man Utd 88 45

Man City 85 47

ICantNameATeam 91 41

You can first stack them using pandas.concat:

df_concat = pandas.concat([df1, df2], axis=0)

which would give you:

df_concat

Team For Against

Man Utd 86 43

Man City 83 45

Chelsea 89 39

Man Utd 88 45

Man City 85 47

ICantNameATeam 91 41

Then you can use dataframe.groupby to take the sum:

df_sum = df_concat.groupby('Team').sum().reset_index()

This will group the dataframe according to unique team names and take the sum of each column.

How can I use python to iterate through multiple csv files and if a value is the same, update another value?

Answers (2)

df1

df2

df_concat

Related Questions