Zanam
Zanam

Reputation: 4807

Python: Intersection of Two 2D Arrays

I have data in .csv file called 'Max.csv':

Valid Date  MAX
1/1/1995    51
1/2/1995    45
1/3/1995    48
1/4/1995    45

Another csv called 'Min.csv' looks like:

Valid Date  MIN
1/2/1995    33
1/4/1995    31
1/5/1995    30
1/6/1995    39

I want two generate two dictionaries or any other suggested data structure so that I can have two separate variables Max and Min in python respectively as:

Valid Date  MAX
1/2/1995    45
1/4/1995    45

Valid Date  MIN
1/2/1995    33
1/4/1995    31

i.e. select the elements from Max and Min so that only the common elements are output.

I am thinking about using numpy.intersect1d, but that means I have to separately compare the Max and Min first on date column, find the index of common dates and then grab the second columns for Max and Min. This appears too complicated and I feel there are smarter ways to intersect two curves Max and Min.

Upvotes: 1

Views: 1019

Answers (2)

Eelco Hoogendoorn
Eelco Hoogendoorn

Reputation: 10759

You mention that:

I have to separately compare the Max and Min first on date column, find the index of common dates and then grab the second columns for Max and Min. This appears too complicated...

Indeed this is fundamentally what you need to do, one way or the other; but using the numpy_indexed package (disclaimer: I am its author), this isn't complicated in the slightest:

import numpy_indexed as npi
common_dates = npi.intersection(min_dates, max_dates)
print(max_values[npi.indices(max_dates, common_dates)])
print(min_values[npi.indices(min_dates, common_dates)])

Note that this solution is fully vectorized (contains no loops on the python-level), and as such is bound to be much faster than the currently accepted answer.

Note2: this is assuming the date columns are unique; if not, you should replace 'npi.indices' with 'npi.in_'

Upvotes: 2

JeanPaulDepraz
JeanPaulDepraz

Reputation: 669

The set() builtin must be enough as follows:

>>> max = {"1/1/1995":"51", "1/2/1995":"45", "1/3/1995":"48", "1/4/1995":"45"}
>>> min = {"1/2/1995":"33", "1/4/1995":"31", "1/5/1995":"30", "1/6/1995":"39"}

>>> a = set(max)
>>> b = set(min)
>>> {x:max[x] for x in a.intersection(b)}
{'1/4/1995': '45', '1/2/1995': '45'}
>>> {x:min[x] for x in a.intersection(b)}
{'1/2/1995': '33', '1/4/1995': '31'}

Upvotes: 1

Related Questions