Python Merge Pandas Dataframe

Question

I am new to Python and am looking for a simple solution.

I have several .csv files with the same structure (number of columns and lines) in one folder. The path is: C: emp

Now I want to read all these .csv files into a new dataframe, which I want to export later as a new .csv file.

up to now i have read each .csv file by hand and saved it into a pandas dataframe.

Here is an example:

df1 = pd.read_csv(r "C:	emp\df1.csv", header= None)
df2 = pd.read_csv(r "C:	emp\df2.csv", header= None)

df1

0 id Feature
1 1 12
2 2 13
3 3 14
4 4 15
5 5 16
6 7 17
7 8 15
8 9 12
9 10 13
10 11 23

Then I used .append to merge the dataframes.

df_new = df1.append(df2)

0   id  Feature
1   1   12
2   2   13
3   3   14
4   4   15
5   5   16
6   7   17
7   8   15
8   9   12
9   10  13
10  11  23
0   id  Feature
1   1   14
2   2   9
3   3   3
4   4   8
5   5   9
6   7   1
7   8   32
8   9   7
9   10  3
10  11  12

df_new.to_csv('df_new.csv', index=False)

Unfortunately this version always has the header with me, but I don't need it. So I deleted it afterwards by hand.

Isn't there a faster version? I'm thinking of a for loop that opens all existing .csv files in the path and reads them line by line into a new dataframe and at the end of the loop makes a .csv file out of it? Unfortunately I have no experience with loops.

I appreciate your help.

thelogicalkoan · Accepted Answer

In [1]: import pandas as pd

In [2]: from io import StringIO

In [3]: df = pd.read_csv(StringIO("""0 id Feature
   ...: 1 1 12
   ...: 2 2 13
   ...: 3 3 14
   ...: 4 4 15
   ...: 5 5 16
   ...: 6 7 17
   ...: 7 8 15
   ...: 8 9 12
   ...: 9 10 13
   ...: 10 11 23"""), sep=' ')

In [4]: df1 = pd.read_csv(StringIO("""0   id  Feature
   ...: 1   1   14
   ...: 2   2   9
   ...: 3   3   3
   ...: 4   4   8
   ...: 5   5   9
   ...: 6   7   1
   ...: 7   8   32
   ...: 8   9   7
   ...: 9   10   3
   ...: 10   11   12"""), sep='   ')

In [10]: pd.concat([df, df1])
Out[10]: 
    0  id  Feature
0   1   1       12
1   2   2       13
2   3   3       14
3   4   4       15
4   5   5       16
5   6   7       17
6   7   8       15
7   8   9       12
8   9  10       13
9  10  11       23
0   1   1       14
1   2   2        9
2   3   3        3
3   4   4        8
4   5   5        9
5   6   7        1
6   7   8       32
7   8   9        7
8   9  10        3
9  10  11       12

In [11]: %timeit pd.concat([df, df1])

188 µs ± 4.86 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [14]: df.append(df1)
Out[14]: 
    0  id  Feature
0   1   1       12
1   2   2       13
2   3   3       14
3   4   4       15
4   5   5       16
5   6   7       17
6   7   8       15
7   8   9       12
8   9  10       13
9  10  11       23
0   1   1       14
1   2   2        9
2   3   3        3
3   4   4        8
4   5   5        9
5   6   7        1
6   7   8       32
7   8   9        7
8   9  10        3
9  10  11       12

In [15]: %timeit df.append(df1)
197 µs ± 4.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

With pandas version '1.1.3'

You can clearly check that pd.concat is faster than df.append(df1).

For working with loops, you can create a variable with the filenames and keep a list of dataframes from those files using for loop, something like this

filename = ['1.csv', '2.csv']

dfs = []

for name in filename:
    dfs.append(pd.read_csv(name))

new_df = pd.concat(dfs)

This is easy, efficient, cleaner and faster as well.

And then save the file to csv.

new_df.to_csv(out_filename)

Python Merge Pandas Dataframe

Answers (2)

Related Questions