PythonNoob
PythonNoob

Reputation: 21

Merging multiple CSV's with different columns

Lets say I have a CSV which is generated yearly by my business. Each year my business decides there is a new type of data we want to collect. So Year2002.csv looks like this:

Age,Gender,Address 
A,B,C

Then year2003.csv adds a new column

Age,Gender,Address,Location, 
A,B,C,D

By the time we get to year 2021, my CSV now has 7 columns and looks like this:

Age,Gender,Address,Location,Height,Weight,Race
A,B,C,D,E,F,G,H

My business wants to create a single CSV which contains all of the data recorded. Where data is not available, (for example, Address data is not recorded in the 2002 CSV) there can be a 0 or a NAAN or a empty cell.

What is the best method available to merge the CSV's into a single CSV? It may be worthwhile saying, that I have 15,000 CSV files which need to be merged. ranging from 2002-2021. 2002 the CSV starts off with three columns, but by 2020, the csv has 10 columns. I want to create one 'master' spreadsheet which contains all of the data.

Just a little extra context... I am doing this because I will then be using Python to replace the empty values using the new data. E.g. calculate an average and replace CSV empty values with that average.

Hope this makes sense. I am just looking for some direction on how to best approach this. I have been playing around with excel, power bi and python but I can not figure out the best way to do this.

Upvotes: 0

Views: 264

Answers (2)

fhirsch
fhirsch

Reputation: 9

With pandas you can use pandas.read_csv() to create Dataframe, which you can merge using pandas.concat().

import pandas as pd

data1 = pd.read_csv(csv1)
data2 = pd.read_csv(csv2)

data = pd.concat(data1, data2)

Upvotes: 1

Y.R.
Y.R.

Reputation: 391

You should take a look at python csv module.

A good place to start: https://www.geeksforgeeks.org/working-csv-files-python/

It is simple and useful for reading CSVs and creating new ones.

Upvotes: 0

Related Questions