CEamonn
CEamonn

Reputation: 925

How can I use python to iterate through multiple csv files and if a value is the same, update another value?

I'm trying to iterate through a few .csv files, and if a value is the same, add up the related entries. If a value is not the same, add that as a new row.

My current code is:

import pandas
s1011 = pandas.read_csv('goals/1011.csv', sep=',', usecols=(1,3,5))
s1011.dropna(how="all", inplace=True)
print s1011

s1112 = pandas.read_csv('goals/1112.csv', sep=',', usecols=(1,3,5))
s1112.dropna(how="all", inplace=True)
print s1112

s1213 = pandas.read_csv('goals/1213.csv', sep=',', usecols=(1,3,5))
s1213.dropna(how="all", inplace=True)
print s1213

Currently this doesn't do much I know. It prints out 3 headings, Team, For & Against. It the prints out 20 football teams, how many goals they've scored, and how many conceded. I've tried using merge in python but it isn't suitable as it just makes one big table.

What I'm trying to do is open the multiple csv files, iterate through the list, if the Team is the same, add together goals for and goals against from each file. If Team has not been entered before, add a new row with new entries.

Is this possible using python?

Edit:

It currently prints

Team For Against

Man Utd 86 43

Man City 83 45

Chelsea 89 39

etc for 20 teams. So I want to update the For and Against entries by adding up the number of goals for each team over a number of seasons. Since it won't be the same 20 teams each season, I want to add a new row of entries if a team hasn't been in the league before.

Upvotes: 0

Views: 352

Answers (2)

user3582076
user3582076

Reputation: 1343

Assume you have the following csv's:

df1

Team For Against

Man Utd 86 43

Man City 83 45

Chelsea 89 39

df2

Team For Against

Man Utd 88 45

Man City 85 47

ICantNameATeam 91 41

You can first stack them using pandas.concat:

df_concat = pandas.concat([df1, df2], axis=0)

which would give you:

df_concat

Team For Against

Man Utd 86 43

Man City 83 45

Chelsea 89 39

Man Utd 88 45

Man City 85 47

ICantNameATeam 91 41

Then you can use dataframe.groupby to take the sum:

df_sum = df_concat.groupby('Team').sum().reset_index()

This will group the dataframe according to unique team names and take the sum of each column.

Upvotes: 1

Knowledgeable Bear
Knowledgeable Bear

Reputation: 27

Here's one approach:

  1. Use Pandas to append the file's identifying number to the For & Against column in each dataframe (i.e. For-1011, For-1112, For-1213).
  2. Use a full outer join on 'Team' to bring in all rows into a new dataframe.
  3. Sum the columns as you'd like: df[For-total] = df[For-1011] + df[For-1112] + df[For-1213]

Upvotes: 0

Related Questions