Reputation: 821
I have two CSV's. They contain the same columns, and data. One CSV has additional records added.
I want to have 1 CSV containing the new additional records, and drop all duplicate records.
I have:
import pandas as pd
rows = pd.read_csv('/home/test/Documents/rows.csv')
rowsadded = pd.read_csv('/home/test/Documents/rowsadded.csv')
joined = rows.append(rowsadded)
reduce = joined.drop_duplicates(subset=None, keep=False, inplace=False)
reduce.to_csv('/home/test/Documents/results.csv')
When I set Keep = False, all records are dropped and only the column names are kept.
Anyone have any advice on dropping the duplicate records after I have appended the CSV's?
UPDATE - Altering the code as follows, appends the new rows from 'rowsadded' CSV to 'rows':
reduce = joined.drop_duplicates(keep=False, inplace=True)
What am I doing wrong - I want to drop duplicates, keep only new rows and write that information to a new CSV?
Upvotes: 0
Views: 1659
Reputation: 775
Try it all in one go
pd.concat([df1,df2]).drop_duplicates(keep=False)
Upvotes: 1