Reputation: 133
I am basically trying to concatenate many csv files of time series type in Pandas I was certain I knew how to do it but apparently I don't.
Consider this simple example:
We have three csv files:
file1.csv=
Alfa, A1, A2, A3, A4
Beta, B1, B2, B3, B4
Gamma, C1, C2, C3, C4
file2.csv=
Delta, D1, D2, D3, D4
Epsilon, E1, E2, E3, E4
Zeta, F1, F2, F3, F4
file3.csv=
Eta, G1, G2, G3, G4
Theta, H1, H2, H3, H4
Iota, I1, I2, I3, I4
1st toy code:
[In]:
import pandas as pd
a = pd.read_csv("/Test/file1.csv", header=None)
b = pd.read_csv("/Test/file2.csv", header=None)
c = pd.read_csv("/Test/file3.csv", header=None)
colnames = ["GREEK", "ROW1", "ROW2", "ROW3", "ROW4"]
concatDf = pd.concat([a,b,c], axis=0)
concatDf.columns = colnames
print(concatDf)
[Out]:
GREEK ROW1 ROW2 ROW3 ROW4
0 Alfa A1 A2 A3 A4
1 Beta B1 B2 B3 B4
2 Gamma C1 C2 C3 C4
0 Delta D1 D2 D3 D4
1 Epsilon E1 E2 E3 E4
2 Zeta F1 F2 F3 F4
0 Eta G1 G2 G3 G4
1 Theta H1 H2 H3 H4
2 Iota I1 I2 I3 I4
Not pretty, but does what it is suppose to do. Since the names of the real CSV-file are very long and the number of files are high, I thought I should make things a bit easier for me. Hence 2nd toy code:
[In]:
import glob
import pandas as pd
allFiles = glob.glob("/Test/" + "*.csv")
dfList = []
for filename in allaFiler:
df = pd.read_csv(filename)
dfList.append(df)
concatDf = pd.concat(dfList, axis=0)
print(concatDf)
[Out]:
A1 A2 A3 A4 D1 D2 D3 D4 G1 G2 G3 G4 Alfa Delta Eta
0 B1 B2 B3 B4 NaN NaN NaN NaN NaN NaN NaN NaN Beta NaN NaN
1 C1 C2 C3 C4 NaN NaN NaN NaN NaN NaN NaN NaN Gamma NaN NaN
0 NaN NaN NaN NaN E1 E2 E3 E4 NaN NaN NaN NaN NaN Epsilon NaN
1 NaN NaN NaN NaN F1 F2 F3 F4 NaN NaN NaN NaN NaN Zeta NaN
0 NaN NaN NaN NaN NaN NaN NaN NaN H1 H2 H3 H4 NaN NaN Theta
1 NaN NaN NaN NaN NaN NaN NaN NaN I1 I2 I3 I4 NaN NaN Iota
file1.csv, file2.csv, and file3.csv are the only files in the Test map. I have checked the documentation, but I have failed to find an explanation.
My questions are as follows:
Thanks in advance!
Upvotes: 1
Views: 785
Reputation: 402413
It appears all your dataframes have different names. This causes pd.concat
to misalign the dataframes during concatenation.
The solution would be to name each one consistently before concatenation. Use the names
parameter to specify the column name when reading your dataframes.
dfList = []
for filename in allaFiler:
df = pd.read_csv(filename, header=None, names=list('ABCD'))
dfList.append(df)
Upvotes: 2