Reputation: 123
I have two csv files like below that I'd like to merge - more or less using the first column ID_ as the unique identifier, and append the AMT column to a new column in the final file.
CSV1
ID_ CUSTOMER_ID_ EMAIL_ADDRESS_
1090 1 [email protected]
1106 2 [email protected]
1145 3 [email protected]
1206 4 [email protected]
1247 5 [email protected]
1254 6 [email protected]
1260 7 [email protected]
1361 8 [email protected]
1376 9 [email protected]
CSV2
ID_ AMT
1090 5
1106 5
1145 5
1206 5
1247 5
1254 65
1260 5
1361 10
1376 5
Here's what I'm looking for in a final file:
ID_ CUSTOMER_ID_ EMAIL_ADDRESS_ AMT
1090 1 [email protected] 5
1106 2 [email protected] 5
1145 3 [email protected] 5
1206 4 [email protected] 5
1247 5 [email protected] 5
1254 6 [email protected] 65
1260 7 [email protected] 5
1361 8 [email protected] 10
1376 9 [email protected] 5
I've tried modifying a this below as much as possible, but not able to get what I'm looking for. Really stuck on this - not sure what else I can do. Really appreciate any and all help!
join -t, File1.csv File2.csv
Data shows in this example contains tabs, but my actual files are CSVs as mentioned and will contain commas as a separator.
Upvotes: 12
Views: 17480
Reputation: 1018
You might want to use the pandas library
import pandas as pd
# Read from CSV files
df1 = pd.read_csv("CSV1.csv")
df2 = pd.read_csv("CSV2.csv")
# Merge the DataFrames on the "ID_" column
merged_df = pd.merge(df1, df2, on="ID_", how="left")
# Save the merged DataFrame into the new new CSV file
merged_df.to_csv("Output.csv", index=False)
# print the result here
print(merged_df)
Upvotes: -1
Reputation: 346
This can be easily done using Pandas library. Here is my code to do this:
'''
This program reads two csv files and merges them based on a common key column.
'''
# import the pandas library
# you can install using the following command: pip install pandas
import pandas as pd
# Read the files into two dataframes.
df1 = pd.read_csv('CSV1.csv')
df2 = pd.read_csv('CSV2.csv')
# Merge the two dataframes, using _ID column as key
df3 = pd.merge(df1, df2, on = 'ID_')
df3.set_index('ID_', inplace = True)
# Write it to a new CSV file
df3.to_csv('CSV3.csv')
You can find a short tutorial on pandas here: https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html
Upvotes: 23