Reputation: 159
I currently have about 500 csv files worth of time series data. I am joining them using this code:
from glob import glob
with open('master.csv', 'a') as singleFile:
for csv in glob('*data.csv'):
if csv == 'master.csv':
pass
else:
for line in open(csv, 'r'):
singleFile.write(line)
However there is an issue with this, the headings of each file is not the same, as the first row has a list of companies, now as time goes on some companies retire and some companies join the market, thus there are inconsistencies as I scroll down my merged file.
To illustrate my problem I'll do an example:
This is currently the format of the first three of my csv files.
My desired output looks like this:
Currently I am arriving at my desired output by manually moving around the columns, is there any way when merging my files python can detect a new column and move it to the right?
Many thanks!
Upvotes: 2
Views: 377
Reputation: 12042
1.csv
:
a,b
1,2
1,2
2.csv
:
b,c
2,3
2,3
3.csv
:
c,d
3,4
3,4
You can use pandas to load and convert the CSV files to dataframes. After that you can concatenate them to a single dataframe which can be saved back to a CSV file.
A full example:
from glob import glob
import pandas as pd
df = pd.concat([pd.read_csv(f, delimiter=',') for f in glob('*.csv') if not f.startswith('all.csv')], ignore_index=True, sort=False)
print(df)
# a b c d
# 0 1.0 2.0 NaN NaN
# 1 1.0 2.0 NaN NaN
# 2 NaN NaN 3.0 4.0
# 3 NaN NaN 3.0 4.0
# 4 NaN 2.0 3.0 NaN
# 5 NaN 2.0 3.0 NaN
df.to_csv('all.csv', index=False)
# a,b,c,d
# 1.0,2.0,,
# 1.0,2.0,,
# ,,3.0,4.0
# ,,3.0,4.0
# ,2.0,3.0,
# ,2.0,3.0,
Image source: https://pandas.pydata.org/pandas-docs/version/0.20/merging.html
Upvotes: 1