Reputation: 103
I have multiple csv
files in the following manner. All of the files have the same format.
| | items | per_unit_amount | number of units |
|---:|:--------|------------------:|------------------:|
| 0 | book | 25 | 5 |
| 1 | pencil | 3 | 10 |
First, I want to calculate the total amount of bills in python. Once calculated the total amount, I need to calculate the total amount of bills for all the csv
files at the same time i.e in a multi-threaded manner.
I need to do it using multi threading.
Upvotes: 1
Views: 553
Reputation: 1089
this would be my way, first merge all CSV files then sum each item:
import glob
import os
import pandas as pd
# the path to your csv file directory
mycsvdir = 'C:\\your csv location\\your csv location'
#select all csv file you can have some kind of filter too
csvfiles = glob.glob(os.path.join(mycsvdir, '*.csv'))
# loop through the files and read them in with pandas
dataframes = [] # a list to hold all the individual pandas DataFrames
for csvfile in csvfiles:
df = pd.read_csv(csvfile)
dataframes.append(df)
# concatenate them all together
result = pd.concat(dataframes, ignore_index=True)
# print out to a new csv file
result.to_csv('all.csv')
now you have all.csv file that is the merge of your CSV file. we can sum any item now by below code:
dff = pd.read_csv('C:\\output folder\\output folder\\all.csv')
table = pd.pivot_table(dff, index =['items', 'per_unit_amount'])
print(table)
Upvotes: 1
Reputation: 1731
You can use pandas
library to achieve that. Install pandas via, pip install pandas
.
The workflow should go like this:
csv
files via glob
from glob import glob
import pandas as pd
# getting a list of all the csv files' path
filenames = glob('./*csv')
# list of dataframes
dfs = [pd.read_csv(filename) for filename in filenames]
# concat all dataframes into one dataframe
big_df = pd.concat(dfs, ignore_index=True)
The big_df
should look like this. Here, I have used two csv files with two rows of input. So the concatenated dataframe has 4 rows in total.
| | items | per_unit_amount | number of units |
|---:|:--------|------------------:|------------------:|
| 0 | book | 25 | 5 |
| 1 | pencil | 3 | 10 |
| 2 | book | 25 | 5 |
| 3 | pencil | 3 | 10 |
Now let's multiply per_unit_amount
with number of units
to get unit_total
:
big_df['unit_total'] = big_df['per_unit_amount'] * big_df['number of units']
Now the dataframe has an extra column:
| | items | per_unit_amount | number of units | unit_total |
|---:|:--------|------------------:|------------------:|-------------:|
| 0 | book | 25 | 5 | 125 |
| 1 | pencil | 3 | 10 | 30 |
| 2 | book | 25 | 5 | 125 |
| 3 | pencil | 3 | 10 | 30 |
You can calculate the total by summing all the entries in the unit_total
column:
total_amount = big_df['unit_total'].sum()
> 310
Upvotes: 0