Reputation: 13
I have multiple csv files which I have appended into mySeries. I need to find sum of 2nd column for all files. Below is my code.
all_files= glob.glob(os.path.join(directory, "*.csv"))
all_df = []
iter = 0
for f in all_files:
df = pd.read_csv(f)
mySeries.append(df)
for i in range(len(mySeries)):
total=0
total= sum(int(row[1]) for row in mySeries[i])
print(total)
Finding the sum gives error IndexError: invalid index to scalar variable.
My data looks like this
Flow
Hour
01-02-2021 20:00 374
01-02-2021 21:00 283
01-02-2021 22:00 108
01-02-2021 23:00 21
01-12-2020 20:00 400
01-12-2020 21:00 199
01-12-2020 22:00 92
01-12-2020 23:00 4
02-02-2021 00:00 1
02-02-2021 01:00 2
Flow
Hour
01-02-2021 20:00 605
01-02-2021 21:00 449
01-02-2021 22:00 334
01-02-2021 23:00 204
01-12-2020 20:00 668
01-12-2020 21:00 505
01-12-2020 22:00 391
01-12-2020 23:00 222
02-02-2021 00:00 137
02-02-2021 01:00 76
Upvotes: 0
Views: 232
Reputation: 14113
Just concat your frames together and sum
all_files = glob.glob(os.path.join(directory, "*.csv"))
pd.concat([pd.read_csv(file) for file in all_files])['Flow'].sum()
Working example below
import pandas as pd
from io import StringIO
s1 = """Hour,Flow
01-02-2021 20:00,374
01-02-2021 21:00,283
01-02-2021 22:00,108
01-02-2021 23:00,21
01-12-2020 20:00,400
01-12-2020 21:00,199
01-12-2020 22:00,92
01-12-2020 23:00,4
02-02-2021 00:00,1
02-02-2021 01:00,2"""
s2 = """Hour,Flow
01-02-2021 20:00,605
01-02-2021 21:00,449
01-02-2021 22:00,334
01-02-2021 23:00,204
01-12-2020 20:00,668
01-12-2020 21:00,505
01-12-2020 22:00,391
01-12-2020 23:00,222
02-02-2021 00:00,137
02-02-2021 01:00,76"""
pd.concat(pd.read_csv(StringIO(file)) for file in [s1,s2])['Flow'].sum()
# 5075
Upvotes: 1