Reputation: 41
I have 4 different csv files.
csv1:
ID Fruit
1001 Apple
1002 Banana
1003 Kiwi
csv2:
ID Color
1001 Green
1005 Red
1006 Orange
1007 Yellow
csv3:
ID Size
1001 Large
1008 Small
1009 Medium
1010 Large
csv4:
ID Price
1002 20
1009 40
1010 30
1011 50
And this is a master csv file that I want to make:
Number ID Fruit Color Size Price
1 1001 Apple Green Large
2 1002 Banana 20
3 1003 Kiwi
4 1005 Red
5 1006 Orange
6 1007 Yellow
7 1008 Small
8 1009 Medium 40
9 1010 Large 30
10 1011 50
I think using pandas will be easier to make it, but I have no idea on Python.
As each csv file has different columns, how can I choose the column and paste all of them together on master csv file? If there is no information, I want to make it NULL or N/A value.
You can see the each of csv files and the master: Click here for image
I spent 6 hours already but I have no idea how to do this.
Thank you in advance.
Upvotes: 4
Views: 3235
Reputation: 31
Concat can be used to append data to dataframe.
file_path1 = r"path1.csv"
df1 = pd.read_csv(file_path1)
file_path2 = r"path2.csv"
df2 = pd.read_csv(file_path2)
file_path3 = r"path3.csv"
df3 = pd.read_csv(file_path3)
frames = [df1,df2,df3]
df = pd.concat(frames)
reset index after concat
df.reset_index(inplace=True)
Upvotes: 0
Reputation: 1986
You can use merge:
import pandas as pd
df1 = pd.read_csv('1.csv')
df2 = pd.read_csv('2.csv')
df3 = pd.read_csv('3.csv')
df4 = pd.read_csv('4.csv')
df = df1.merge(df2).merge(df3).merge(df4)
df.to_csv('result.csv')
Upvotes: 0
Reputation: 59549
reduce
+ combine_first
The key is to set 'ID'
as the index that way we get the proper alignment across both axes. I've assumed all DataFrames are in memory, but if not you can read them into a list, or do the reading in the reduce
step.
from functools import reduce
my_dfs = [df.set_index('ID') for df in [df1, df2, df3, df4]]
#my_dfs = [pd.read_csv(file).set_index('ID') for file in your_list_of_files]
reduce(lambda l,r: l.combine_first(r), my_dfs)
Color Fruit Price Size
ID
1001 Green Apple NaN Large
1002 NaN Banana 20.0 NaN
1003 NaN Kiwi NaN NaN
1005 Red NaN NaN NaN
1006 Orange NaN NaN NaN
1007 Yellow NaN NaN NaN
1008 NaN NaN NaN Small
1009 NaN NaN 40.0 Medium
1010 NaN NaN 30.0 Large
1011 NaN NaN 50.0 NaN
Upvotes: 1
Reputation: 565
Something like this should work:
import pandas as pd
list_of_csv_filenames = ['csv1.csv', 'csv2.csv', 'csv3.csv', 'csv4.csv']
all_dfs = []
for i in range(1, 5):
temp = pd.read_csv(list_of_csv_filesnames[i-1])
temp['Number'] = i
all_dfs.append(temp)
full_df = pd.concat(all_dfs)
full_df.to_csv('output_filename.csv', index=False)
Upvotes: 0