Reputation: 103

process multiple csv file in python

I have multiple csv files in the following manner. All of the files have the same format.

|    | items   |   per_unit_amount |   number of units |
|---:|:--------|------------------:|------------------:|
|  0 | book    |                25 |                 5 |
|  1 | pencil  |                 3 |                10 |

First, I want to calculate the total amount of bills in python. Once calculated the total amount, I need to calculate the total amount of bills for all the csv files at the same time i.e in a multi-threaded manner.

I need to do it using multi threading.

Upvotes: 1

Answers (2)

Mohsen

Reputation: 1089

this would be my way, first merge all CSV files then sum each item:

import glob
import os
import pandas as pd

# the path to your csv file directory
mycsvdir = 'C:\\your csv location\\your csv location'

#select all csv file you can have some kind of filter too
csvfiles = glob.glob(os.path.join(mycsvdir, '*.csv'))

# loop through the files and read them in with pandas
dataframes = []  # a list to hold all the individual pandas DataFrames
for csvfile in csvfiles:
    df = pd.read_csv(csvfile)
    dataframes.append(df)

# concatenate them all together
result = pd.concat(dataframes, ignore_index=True)

# print out to a new csv file
result.to_csv('all.csv')

now you have all.csv file that is the merge of your CSV file. we can sum any item now by below code:

dff = pd.read_csv('C:\\output folder\\output folder\\all.csv')


table = pd.pivot_table(dff, index =['items', 'per_unit_amount'])
print(table)

Upvotes: 1

rednafi

Reputation: 1731

You can use pandas library to achieve that. Install pandas via, pip install pandas.

The workflow should go like this:

Get a list of the filenames (filepath actually) of the csv files via glob
Iterate the filenames, load the files using pandas and keep them in a list
Concat the list of the dataframes into a big dataframe
Perform you desired calculations

from glob import glob
import pandas as pd

# getting a list of all the csv files' path
filenames = glob('./*csv')

# list of dataframes
dfs = [pd.read_csv(filename) for filename in filenames]

# concat all dataframes into one dataframe
big_df = pd.concat(dfs, ignore_index=True)

The big_df should look like this. Here, I have used two csv files with two rows of input. So the concatenated dataframe has 4 rows in total.

|    | items   |   per_unit_amount |   number of units |
|---:|:--------|------------------:|------------------:|
|  0 | book    |                25 |                 5 |
|  1 | pencil  |                 3 |                10 |
|  2 | book    |                25 |                 5 |
|  3 | pencil  |                 3 |                10 |

Now let's multiply per_unit_amount with number of units to get unit_total:

big_df['unit_total'] = big_df['per_unit_amount'] * big_df['number of units']

Now the dataframe has an extra column:

|    | items   |   per_unit_amount |   number of units |   unit_total |
|---:|:--------|------------------:|------------------:|-------------:|
|  0 | book    |                25 |                 5 |          125 |
|  1 | pencil  |                 3 |                10 |           30 |
|  2 | book    |                25 |                 5 |          125 |
|  3 | pencil  |                 3 |                10 |           30 |

You can calculate the total by summing all the entries in the unit_total column:

total_amount = big_df['unit_total'].sum()

> 310

Upvotes: 0

process multiple csv file in python

Answers (2)

Related Questions