mountaindiu
mountaindiu

Reputation: 41

How to merge multiple csv files into one file with specific columns on pandas, python?

I have 4 different csv files.

csv1:

ID      Fruit
1001    Apple
1002    Banana
1003    Kiwi

csv2:

ID      Color
1001    Green
1005    Red
1006    Orange
1007    Yellow

csv3:

ID      Size
1001    Large
1008    Small
1009    Medium
1010    Large

csv4:

ID      Price
1002    20
1009    40
1010    30
1011    50

And this is a master csv file that I want to make:

Number  ID      Fruit   Color   Size    Price

1       1001    Apple   Green   Large   
2       1002    Banana                  20
3       1003    Kiwi            
4       1005            Red     
5       1006            Orange      
6       1007            Yellow      
7       1008            Small   
8       1009            Medium          40
9       1010            Large           30
10      1011                            50

I think using pandas will be easier to make it, but I have no idea on Python.

As each csv file has different columns, how can I choose the column and paste all of them together on master csv file? If there is no information, I want to make it NULL or N/A value.

You can see the each of csv files and the master: Click here for image

I spent 6 hours already but I have no idea how to do this.

Thank you in advance.

Upvotes: 4

Views: 3235

Answers (4)

Inuganhu
Inuganhu

Reputation: 31

Concat can be used to append data to dataframe.   
            
                file_path1 = r"path1.csv"
                df1 = pd.read_csv(file_path1)
                file_path2 = r"path2.csv"
                df2 = pd.read_csv(file_path2)
                file_path3 = r"path3.csv"
                df3 = pd.read_csv(file_path3)
                frames = [df1,df2,df3]
                df = pd.concat(frames)
reset index after concat
               df.reset_index(inplace=True)

Upvotes: 0

Andrei Berenda
Andrei Berenda

Reputation: 1986

You can use merge:

import pandas as pd

df1 = pd.read_csv('1.csv')
df2 = pd.read_csv('2.csv')
df3 = pd.read_csv('3.csv')
df4 = pd.read_csv('4.csv')
df = df1.merge(df2).merge(df3).merge(df4)
df.to_csv('result.csv')

Upvotes: 0

ALollz
ALollz

Reputation: 59549

reduce + combine_first

The key is to set 'ID' as the index that way we get the proper alignment across both axes. I've assumed all DataFrames are in memory, but if not you can read them into a list, or do the reading in the reduce step.

from functools import reduce

my_dfs = [df.set_index('ID') for df in [df1, df2, df3, df4]]
#my_dfs = [pd.read_csv(file).set_index('ID') for file in your_list_of_files]

reduce(lambda l,r: l.combine_first(r), my_dfs)

       Color   Fruit  Price    Size
ID                                 
1001   Green   Apple    NaN   Large
1002     NaN  Banana   20.0     NaN
1003     NaN    Kiwi    NaN     NaN
1005     Red     NaN    NaN     NaN
1006  Orange     NaN    NaN     NaN
1007  Yellow     NaN    NaN     NaN
1008     NaN     NaN    NaN   Small
1009     NaN     NaN   40.0  Medium
1010     NaN     NaN   30.0   Large
1011     NaN     NaN   50.0     NaN

Upvotes: 1

kgolyaev
kgolyaev

Reputation: 565

Something like this should work:

import pandas as pd

list_of_csv_filenames = ['csv1.csv', 'csv2.csv', 'csv3.csv', 'csv4.csv']
all_dfs = []
for i in range(1, 5):
    temp = pd.read_csv(list_of_csv_filesnames[i-1])
    temp['Number'] = i
    all_dfs.append(temp)
full_df = pd.concat(all_dfs)
full_df.to_csv('output_filename.csv', index=False)

Upvotes: 0

Related Questions