Reputation: 75
I am new to Python and am looking for a simple solution.
I have several .csv files with the same structure (number of columns and lines) in one folder. The path is: C:\temp
Now I want to read all these .csv files into a new dataframe, which I want to export later as a new .csv file.
up to now i have read each .csv file by hand and saved it into a pandas dataframe.
Here is an example:
df1 = pd.read_csv(r "C:\temp\df1.csv", header= None)
df2 = pd.read_csv(r "C:\temp\df2.csv", header= None)
df1
0 id Feature
1 1 12
2 2 13
3 3 14
4 4 15
5 5 16
6 7 17
7 8 15
8 9 12
9 10 13
10 11 23
Then I used .append to merge the dataframes.
df_new = df1.append(df2)
0 id Feature
1 1 12
2 2 13
3 3 14
4 4 15
5 5 16
6 7 17
7 8 15
8 9 12
9 10 13
10 11 23
0 id Feature
1 1 14
2 2 9
3 3 3
4 4 8
5 5 9
6 7 1
7 8 32
8 9 7
9 10 3
10 11 12
df_new.to_csv('df_new.csv', index=False)
Unfortunately this version always has the header with me, but I don't need it. So I deleted it afterwards by hand.
Isn't there a faster version? I'm thinking of a for loop that opens all existing .csv files in the path and reads them line by line into a new dataframe and at the end of the loop makes a .csv file out of it? Unfortunately I have no experience with loops.
I appreciate your help.
Upvotes: 0
Views: 110
Reputation: 4975
You could do something like this:
frames = []
for i in range(0,10):
frames.append(pd.read_csv(r "C:\temp\df%d.csv" % i))
df = pd.concat(frames)
Which should concatenate all csv files into one dataframe and which works if all filenames are structured like df1d.csv
, df2d.csv
etc. and you know the amount of files. If you just want to loop over all files in the folder, you could do something like this:
import glob
path = "C:\temp\*.csv"
frames = []
for fname in glob.glob(path):
frames.append(pd.read_csv(fname))
df = pd.concat(frames)
The extended version, without a loop would be:
df1 = pd.read_csv(r "C:\temp\df1.csv")
df2 = pd.read_csv(r "C:\temp\df2.csv")
df3 = pd.read_csv(r "C:\temp\df3.csv")
frames = [df1, df2, df3]
df = pd.concat(frames)
Afterwards you could save the dataframe back to a csv file:
df.to_csv(file_name)
Upvotes: 1
Reputation: 620
In [1]: import pandas as pd
In [2]: from io import StringIO
In [3]: df = pd.read_csv(StringIO("""0 id Feature
...: 1 1 12
...: 2 2 13
...: 3 3 14
...: 4 4 15
...: 5 5 16
...: 6 7 17
...: 7 8 15
...: 8 9 12
...: 9 10 13
...: 10 11 23"""), sep=' ')
In [4]: df1 = pd.read_csv(StringIO("""0 id Feature
...: 1 1 14
...: 2 2 9
...: 3 3 3
...: 4 4 8
...: 5 5 9
...: 6 7 1
...: 7 8 32
...: 8 9 7
...: 9 10 3
...: 10 11 12"""), sep=' ')
In [10]: pd.concat([df, df1])
Out[10]:
0 id Feature
0 1 1 12
1 2 2 13
2 3 3 14
3 4 4 15
4 5 5 16
5 6 7 17
6 7 8 15
7 8 9 12
8 9 10 13
9 10 11 23
0 1 1 14
1 2 2 9
2 3 3 3
3 4 4 8
4 5 5 9
5 6 7 1
6 7 8 32
7 8 9 7
8 9 10 3
9 10 11 12
In [11]: %timeit pd.concat([df, df1])
188 µs ± 4.86 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [14]: df.append(df1)
Out[14]:
0 id Feature
0 1 1 12
1 2 2 13
2 3 3 14
3 4 4 15
4 5 5 16
5 6 7 17
6 7 8 15
7 8 9 12
8 9 10 13
9 10 11 23
0 1 1 14
1 2 2 9
2 3 3 3
3 4 4 8
4 5 5 9
5 6 7 1
6 7 8 32
7 8 9 7
8 9 10 3
9 10 11 12
In [15]: %timeit df.append(df1)
197 µs ± 4.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
With pandas version '1.1.3'
You can clearly check that pd.concat
is faster than df.append(df1)
.
For working with loops, you can create a variable with the filenames and keep a list of dataframes from those files using for loop, something like this
filename = ['1.csv', '2.csv']
dfs = []
for name in filename:
dfs.append(pd.read_csv(name))
new_df = pd.concat(dfs)
This is easy, efficient, cleaner and faster as well.
And then save the file to csv.
new_df.to_csv(out_filename)
Upvotes: 0