Reputation: 88
I want to update a .csv file's specific column/columns from another .csv file's columns. However, when I execute the following script, after lots of trials, the output .csv file either gets its old fields removed or old fields are duplicated. I also want to not add the already added fields from the input file when the script runs again.
with open('Shop Export File.csv', 'r', encoding='utf-8-sig') as shop_file:
with open('Shipping CSV.csv', 'a', encoding='utf-8-sig', newline='') as shipping_file:
shop_csv = csv.DictReader(shop_file)
shipping_csv = csv.DictWriter(shipping_file, fieldnames=fieldnames)
for r in shop_csv:
data_input = {
'Deliver To Name': r['Delivery Customer'],
'Deliver To Business Name': r['Delivery Company Name'],
'Deliver To State': r['Delivery State'].strip('AU-'),
'Deliver To Suburb': r['Delivery City'],
'Deliver To Postcode': r['Delivery Zip Code'].strip('"'),
'Deliver To Phone Number': r['Shipping Phone'].strip('"')
}
shipping_csv.writerow(data_input)
The 'Deliver To' are the fieldnames in the output .csv and the latter ones are from the input .csv.
In another way, I want only specific columns of the output csv file totally overwritten by the columns in the input csv file with no change to other columns.
Upvotes: 0
Views: 449
Reputation: 88
I was able to find the solution by renaming the headers of the columns from file1 the same names as the headers of file2, so that they could add to each other using csv.DictReader and csv.DictWriter methods.
Finally, I was able to prevent duplication of pre-existing column data by checking if the name in that specific row is in the list of names already existing or not, if yes, it would skip the whole row and get the other name with the information in that row.
import csv
try:
with open('Shipping CSV.csv', 'r', encoding='utf-8-sig') as shipping_read:
shipping_old_data = csv.reader(shipping_read)
shipping_names = []
for row in shipping_old_data:
shipping_names.append(row[12])
with open('Shop Export File.csv', 'r', encoding='utf-8-sig') as shop_file:
with open('Shipping CSV.csv', 'a', encoding='utf-8-sig', newline='') as shipping_file:
shop_rows_as_dicts = csv.DictReader(shop_file)
rows = []
for row in shop_rows_as_dicts:
phone_number = row['Shipping Phone'].strip('"')
zip_code = row['Delivery Zip Code'].strip('"')
rows.append({
'Deliver To Name': row['Delivery Customer'],
'Deliver To Business Name': row['Delivery Company Name'],
'Deliver To State': row['Delivery State'].strip('AU-'),
'Deliver To Suburb': row['Delivery City'],
'Deliver To Postcode': f"'{zip_code}",
'Deliver To Phone Number': f"'{phone_number}",
'Deliver To Address Line 1': row['Delivery Street Name&Number'],
'Item Description': row['Item\'s Variant'].replace('Product', '')})
shipping_csv = csv.DictWriter(shipping_file, fieldnames=fieldnames)
counter = 0
for row in rows:
if row['Deliver To Name'] in shipping_names:
pass
else:
counter += 1
shipping_csv.writerow(row)
print(f'[+] Successfully transferred row {counter}')
input('\n[+] Process Finished.\nPress enter to exit.')
except FileNotFoundError:
input('File names might have changed. Press Enter to Exit.')
Thanks to the answer above by scotscotmcc, it helped me get to this solution aswell!
Upvotes: 0
Reputation: 3113
If I'm understanding right, another way to phrase what the final thing you want is for the two files put together but duplicates removed, right? From what I'm reading, it also sounds like you don't have cases where a single field is being changed - is that right?
If that is the case, you may be better off using the pandas
library to bring in both files as dataframes, and then using [append]
to, well, append the two dataframes, and then [drop_duplicates]
.
Pandas can also read the csv files and write/append your final file.
import pandas as pd
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
#this has saved the data from the two different csv files as two dataframes
df1.rename(columns={
'Deliver To Name':'Delivery Customer',
'Deliver To Business Name': 'Delivery Company Name',
'Deliver To State': 'Delivery State',
'Deliver To Suburb': 'Delivery City',
'Deliver To Postcode': 'Delivery Zip Code',
'Deliver To Phone Number': 'Shipping Phone'
},inplace=True) #rename the columns in df1 so you can append and they will line up
df1['Delivery State'] = df1['Delivery State'].str.strip('AU-')
df1['Delivery Zip Code'] = df1['Delivery Zip Code'].str.strip('"')
df1['Shipping Phone'] = df1['Shipping Phone'].str.strip('"')
df2 = df2.append(df1) #everything in df2 with df1 now appended. any columns that were different between the two will remain, but will have NaN or None for the values from the other df
df2 = df2.drop_duplicates() #looks for every field to be a duplicate. you can specify which columns to look at
df2.to_csv('file2.csv')
This solution still does not take into consideration at all if the record already existed in your second file and you wanted to update it. This is just taking the new info from the first file and adding it to the second (with the cleanup).
Upvotes: 1