Reputation: 380
Supposing using NumPy, given two matrixes mat_A
and mat_B
:
value
date
2020-10-16 49.78
2020-10-15 50.30
2020-10-14 50.44
2020-10-13 50.85
2020-10-12 51.42
... ...
2014-01-08 45.25
2014-01-07 45.18
2014-01-06 45.19
2014-01-03 45.21
2014-01-02 44.95
[1711 rows x 2 columns]
value
date
2020-10-16 95.0
2020-10-15 95.0
2020-10-14 95.0
2020-10-13 95.0
2020-10-12 95.0
... ...
2012-08-08 209.0
2012-08-07 209.0
2012-08-06 209.0
2012-08-03 209.0
2012-08-02 209.0
[1361 rows x 2 columns]
Is there an efficient way to get the following?
mat_A
and mat_B
:
date
as pivotmat_A
and mat_B
with value=None
mat_A
, and mat_B
Upvotes: 0
Views: 310
Reputation: 1520
To intersect you may use intersect1d
_, inter, _ = numpy.intersect1d(a[:, 0], b[:, 0], assume_unique=True, return_indices=True)
Which gives you indexes of a
in b
and vice versa. This one is pivoted.
And if you need not pivoted one then you may do it thrice
_, inter 1, _ = numpy.intersect1d(a[:, 1], b[:, 1], assume_unique=True, return_indices=True)
_, no_piv_inter, _ = numpy.intersect1d(inter1, inter2, assume_unique=True, return_indices=True)
For symmetric difference, there is setxor1d
diff = numpy.setxor1d(a[:, 0], b[:, 0], assume_unique=True, return_indices=True)
There is union1d
. But it seems to do not work for your case because it does not return indexes. To join you may use vstack
and unique
.
v = numpy.vstack((a, b))
_, unique = numpy.unique(v[:,0], axis=0, return_index=True)
union = v[unique, :]
Then you may fill the difference
diff_idx = numpy.in1d(union[:, 0], diff, assume_unique=True)
union[diff_idx, 0] = None
I did not run all codes. Do not beat me for syntax errors. The idea should be correct.
This way is not efficient in terms of complexity. I believe that the data structure is wrong. For the requested operation hash table that is dict in python are preferable. But these numpyed operations supposed to be fast relative to loops.
Upvotes: 1