Reputation: 2407
I am trying to test same example given on Matrix search operation using numpy and pandas
on 3.5.0-17-generic #28-Ubuntu SMP Tue Oct 9 19:32:08 UTC 2012 i686 i686 i686 GNU/Linux
with python 2.7.3, numpy 1.9.2 and pandas 0.15.2
For this small exammple :
ds1 = [[ 4, 13, 6, 9],
[ 7, 12, 5, 7],
[ 7, 0, 4, 22],
[ 9, 8, 12, 0]]
ds2 = [[ 4, 1],
[ 5, 3],
[ 6, 1],
[ 7, 2],
[ 8, 2],
[ 9, 3],
[12, 1],
[13, 2],
[22, 3]]
ds1= pd.DataFrame(ds1)
ds2= pd.DataFrame(ds2)
C = np.where(ds1.values.ravel()[:, None] == ds2.values[:, 0])
print C
gives wrong result
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14]),
array([ 0, 7, 2, 5, 3, 6, 1, 3, 3, 0, 8, 5, 4, 6]))
Expected output is
output = [[1, 2, 1, 3],
[2, 1, 3, 2],
[2, 0, 1, 3],
[3, 2, 1, 0]]
and while working with large matrix values
ds1 = pd.read_table('https://gist.githubusercontent.com/karimkhanp/9527bad750fbe75e072c/raw/ds1', sep=' ', header=None)
ds2 = pd.read_table('https://gist.githubusercontent.com/karimkhanp/1692f1f76718c35e939f/raw/6f6b348ab0879b702e1c3c5e362e9d2062e9e9bc/ds2', header=None, sep=' ')
C = np.where(ds1.values.ravel()[:, None] == ds2.values[:, 0])
print C
it gives
(1000, 1001) (4000, 2)
(array([], dtype=int32),)
instead of the replaced matrix value.
Any suggestion would be much helpful.
Upvotes: 0
Views: 305
Reputation: 818
I agree with @Anthony Lethuillier 's answer and I just guess the IndexError
may be caused by different version. It seem's in @nlper 's situation, C
is (array([], dtype=int32),)
which means nothing found in ds1.values.ravel()[:, None] == ds2.values[:, 0]
, and this is obviously different from @Anthony 's. Nothing found, thus C
is a tuple which only contains 1 element, so an IndexError
is triggered when you accessing C[1]
.
This also works on my machine so I don't know why C
is empty. I recommend you to print ds1.values.ravel()
and ds2.values[:, 0]
in detail and see why nothing equals.
Besides, I use python 2.7.9, numpy 1.9.2 and pandas 0.16.1
Upvotes: 2
Reputation: 1539
The second array in C (array([ 0, 7, 2, 5, 3, 6, 1, 3, 3, 0, 8, 5, 4, 6]) gives you the positions of the values you want to replace in ds1.
So you have to replace the values in ds1.values.ravel() with the index of the first array of C with the values in ds2 with the index of the second array of C
Here is the code that gives the right output for the small example :
import pandas as pd
import numpy as np
ds1 = [[ 4, 13, 6, 9],
[ 7, 12, 5, 7],
[ 7, 0, 4, 22],
[ 9, 8, 12, 0]]
ds2 = [[ 4, 1],
[ 5, 3],
[ 6, 1],
[ 7, 2],
[ 8, 2],
[ 9, 3],
[12, 1],
[13, 2],
[22, 3]]
ds1= pd.DataFrame(ds1)
ds2= pd.DataFrame(ds2)
C = np.where(ds1.values.ravel()[:, None] == ds2.values[:, 0])
ds1_new = ds1.values.ravel()
ds1_new[C[0]]=ds2.values[C[1], 1]
ds1_new = ds1_new.reshape(4,4)
print(ds1_new)
ds1 = pd.read_table('https://gist.githubusercontent.com/karimkhanp/9527bad750fbe75e072c/raw/ds1', sep=' ', header=None)
ds2 = pd.read_table('https://gist.githubusercontent.com/karimkhanp/1692f1f76718c35e939f/raw/6f6b348ab0879b702e1c3c5e362e9d2062e9e9bc/ds2', header=None, sep=' ')
C = np.where(ds1.values.ravel()[:, None] == ds2.values[:, 0])
ds1_new = ds1.values.ravel()
ds1_new[C[0]]=ds2.values[C[1], 1]
ds1_new = ds1_new.reshape(1000,1001)
print(ds1_new)
Gives the following output :
[[1 2 1 3]
[2 1 3 2]
[2 0 1 3]
[3 2 1 0]]
[[ 1. 1. 1. ..., 1. 1. nan]
[ 1. 1. 1. ..., 0. 1. nan]
[ 1. 0. 1. ..., 1. 0. nan]
...,
[ 1. 1. 1. ..., 0. 1. nan]
[ 1. 0. 1. ..., 1. 1. nan]
[ 1. 1. 1. ..., 0. 1. nan]]
Upvotes: 1