user10358702
user10358702

Reputation:

How to compare two CSV files by column and save the differences in csv file using pandas python

I have two csv files like the following

File1

x1 10.00 a1
x2 10.00 a2
x3 11.00 a1
x4 10.50 a2
x5 10.00 a3
x6 12.00 a3

File2

x1
x4
x5

I would like to create a new file that contains

x2
x3
x6

using pandas or python

Upvotes: 0

Views: 85

Answers (1)

jezrael
jezrael

Reputation: 862641

Use Series.isin with ~ for filtering of values not existing in df1[0] - in first column with DataFrame.loc and boolean indexing:

import pandas as pd

#create DataFrame from first file
df1 = pd.read_csv(file1, sep=";", header=None)
print (df1)
    0     1   2
0  x1  10.0  a1
1  x2  10.0  a2
2  x3  11.0  a1
3  x4  10.5  a2
4  x5  10.0  a3
5  x6  12.0  a3

#create DataFrame from second file
df2 = pd.read_csv(file2, header=None, sep='|')
print (df2)
    0
0  x1
1  x4
2  x5

s = df1.loc[~df1[0].isin(df2[0]), 0]
print (s)
1    x2
2    x3
5    x6
Name: 0, dtype: object

#write to file
s.to_csv('new.csv', index=False, header=False)

Upvotes: 1

Related Questions