Reputation: 21
Lets say I have a CSV which is generated yearly by my business. Each year my business decides there is a new type of data we want to collect. So Year2002.csv looks like this:
Age,Gender,Address
A,B,C
Then year2003.csv adds a new column
Age,Gender,Address,Location,
A,B,C,D
By the time we get to year 2021, my CSV now has 7 columns and looks like this:
Age,Gender,Address,Location,Height,Weight,Race
A,B,C,D,E,F,G,H
My business wants to create a single CSV which contains all of the data recorded. Where data is not available, (for example, Address data is not recorded in the 2002 CSV) there can be a 0 or a NAAN or a empty cell.
What is the best method available to merge the CSV's into a single CSV? It may be worthwhile saying, that I have 15,000 CSV files which need to be merged. ranging from 2002-2021. 2002 the CSV starts off with three columns, but by 2020, the csv has 10 columns. I want to create one 'master' spreadsheet which contains all of the data.
Just a little extra context... I am doing this because I will then be using Python to replace the empty values using the new data. E.g. calculate an average and replace CSV empty values with that average.
Hope this makes sense. I am just looking for some direction on how to best approach this. I have been playing around with excel, power bi and python but I can not figure out the best way to do this.
Upvotes: 0
Views: 264
Reputation: 9
With pandas you can use pandas.read_csv()
to create Dataframe, which you can merge using pandas.concat()
.
import pandas as pd
data1 = pd.read_csv(csv1)
data2 = pd.read_csv(csv2)
data = pd.concat(data1, data2)
Upvotes: 1
Reputation: 391
You should take a look at python csv module.
A good place to start: https://www.geeksforgeeks.org/working-csv-files-python/
It is simple and useful for reading CSVs and creating new ones.
Upvotes: 0