userjuicer
userjuicer

Reputation: 133

Concatenating (many) CSV files via Pandas

I am basically trying to concatenate many csv files of time series type in Pandas I was certain I knew how to do it but apparently I don't.

Consider this simple example:

We have three csv files:

file1.csv=

Alfa, A1, A2, A3, A4
Beta, B1, B2, B3, B4
Gamma, C1, C2, C3, C4

file2.csv=

Delta, D1, D2, D3, D4
Epsilon, E1, E2, E3, E4
Zeta, F1, F2, F3, F4

file3.csv=

Eta, G1, G2, G3, G4
Theta, H1, H2, H3, H4
Iota, I1, I2, I3, I4

1st toy code:

[In]:

import pandas as pd
a = pd.read_csv("/Test/file1.csv", header=None)
b = pd.read_csv("/Test/file2.csv", header=None)
c = pd.read_csv("/Test/file3.csv", header=None)
colnames = ["GREEK", "ROW1", "ROW2", "ROW3", "ROW4"]
concatDf = pd.concat([a,b,c], axis=0)
concatDf.columns = colnames
print(concatDf)

[Out]:

     GREEK ROW1 ROW2 ROW3 ROW4
0     Alfa   A1   A2   A3   A4
1     Beta   B1   B2   B3   B4
2    Gamma   C1   C2   C3   C4
0    Delta   D1   D2   D3   D4
1  Epsilon   E1   E2   E3   E4
2     Zeta   F1   F2   F3   F4
0      Eta   G1   G2   G3   G4
1    Theta   H1   H2   H3   H4
2     Iota   I1   I2   I3   I4

Not pretty, but does what it is suppose to do. Since the names of the real CSV-file are very long and the number of files are high, I thought I should make things a bit easier for me. Hence 2nd toy code:

[In]:

import glob
import pandas as pd

 allFiles = glob.glob("/Test/" + "*.csv")
 dfList = []
 for filename in allaFiler:
    df = pd.read_csv(filename)
    dfList.append(df)

 concatDf = pd.concat(dfList, axis=0)
 print(concatDf)

[Out]:

    A1  A2  A3  A4  D1  D2  D3  D4  G1  G2  G3  G4 Alfa   Delta   Eta
0   B1  B2  B3  B4 NaN NaN NaN NaN NaN NaN NaN NaN Beta     NaN   NaN   
1   C1  C2  C3  C4 NaN NaN NaN NaN NaN NaN NaN NaN Gamma    NaN   NaN   
0  NaN NaN NaN NaN  E1  E2  E3  E4 NaN NaN NaN NaN  NaN Epsilon   NaN   
1  NaN NaN NaN NaN  F1  F2  F3  F4 NaN NaN NaN NaN  NaN    Zeta   NaN   
0  NaN NaN NaN NaN NaN NaN NaN NaN  H1  H2  H3  H4  NaN     NaN Theta  
1  NaN NaN NaN NaN NaN NaN NaN NaN  I1  I2  I3  I4  NaN     NaN  Iota

file1.csv, file2.csv, and file3.csv are the only files in the Test map. I have checked the documentation, but I have failed to find an explanation.

My questions are as follows:

  1. Why doesn't code 2 yield the same result as code 1? I am clearly missing something very fundamental, so please help me to understand what is going on.
  2. What changes do I have to do to make code 2 yield the same result as code 1?

Thanks in advance!

Upvotes: 1

Views: 785

Answers (1)

cs95
cs95

Reputation: 402413

It appears all your dataframes have different names. This causes pd.concat to misalign the dataframes during concatenation.

The solution would be to name each one consistently before concatenation. Use the names parameter to specify the column name when reading your dataframes.

dfList = []
for filename in allaFiler:
   df = pd.read_csv(filename, header=None, names=list('ABCD'))
   dfList.append(df)

Upvotes: 2

Related Questions