Reputation: 2299
I have various csv files that have the same columns, and need to concatenate the data into one dataframe/csv for an index of 1-24. The name of each csv file then becomes the name of it's relevant column. Example:
one.csv two.csv
Time hour energy Time hour energy
00:59 1 0 00:59 1 0
01:59 2 3 01:59 2 5
02:59 3 2 02:59 3 8
... ...
23:59 24 6 23:59 24 3
Combined.csv
hour one two
1 0 0
2 3 5
3 2 8
...
24 6 3
The closest I have seen is this:
path = r'C:\files'
all_files = glob.glob(os.path.join(path, "*.csv"))
df_from_each_file = (pd.read_csv(f) for f in all_files)
concatenated_df = pd.concat(df_from_each_file, ignore_index=True)
But it's not quite right as it adds rows, not columns.
Upvotes: 0
Views: 138
Reputation: 210832
You should use pd.concat(..., axis=1)
parameter in order to concatenate DFs horizontally:
import os
import glob
import pandas as pd
In [46]: files = glob.glob(r'D:\temp\.data\42011160\*.csv')
In [47]: pd.concat([pd.read_csv(f, usecols=['hour', 'energy'], index_col='hour')
...: .rename(columns=lambda x: os.path.basename(os.path.splitext(f)[0]))
...: for f in files],
...: axis=1).reset_index()
...:
Out[47]:
hour 1 2
0 1 0 0
1 2 3 5
2 3 2 8
3 24 6 3
where:
In [48]: files
Out[48]: ['D:\\temp\\.data\\42011160\\1.csv', 'D:\\temp\\.data\\42011160\\2.csv']
Upvotes: 1