A S
A S

Reputation: 103

How to find intersection or subset of two CSV files

I have 2 CSV files containing two columns and a large number of rows. The first column is the id, and the second is the set of paired values. e.g.:

CSV1:

1 {[1,2],[1,4],[5,6],[3,1]}

2 {[2,4] ,[6,3], [8,3]}

3 {[3,2], [5,2], [3,5]}

CSV2:

1 {[2,4] ,[6,3], [8,3]}

2 {[3,4] ,[3,3], [2,3]}

3 {[1,4],[5,6],[3,1],[5,5]}

Now I need to get a CSV file which contains either exact matching items or subset which belongs to both CSVs.

Here the result should be:

{[2,4] ,[6,3], [8,3]}

{[1,4],[5,6],[3,1]}

Can anyone suggest python code to do this?

Upvotes: 3

Views: 2862

Answers (1)

agold
agold

Reputation: 6276

As suggested by this answer you can use set.intersection to get the intersection of two sets, however this does not work with lists as items. Instead you can also use filter (comparable to this answer):

>>> l1 = [[1,2],[1,4],[5,6],[3,1]]
>>> l2 = [[1,4],[5,6],[3,1],[5,5]]
>>> filter(lambda q: q in l2, l1) 
[[1, 4], [5, 6], [3, 1]]

In Python 3 you should convert it to list since there filter returns an iterable:

>>> list(filter(lambda x: x in l2,l1))

You can load CSV files (if they are really comma [or some other character] separated files) with csv.reader or pandas.read_csv for example.

Upvotes: 3

Related Questions