nlper
nlper

Reputation: 2407

matrix operation using numpy pandas

I am trying to test same example given on Matrix search operation using numpy and pandas

on 3.5.0-17-generic #28-Ubuntu SMP Tue Oct 9 19:32:08 UTC 2012 i686 i686 i686 GNU/Linux with python 2.7.3, numpy 1.9.2 and pandas 0.15.2

For this small exammple :

ds1 = [[ 4, 13,  6,  9],
       [ 7, 12,  5,  7],
       [ 7,  0,  4, 22],
       [ 9,  8, 12,  0]]
ds2 = [[ 4,  1],
       [ 5,  3],
       [ 6,  1],
       [ 7,  2],
       [ 8,  2],
       [ 9,  3],
       [12,  1],
       [13,  2],
       [22,  3]]

ds1= pd.DataFrame(ds1)
ds2= pd.DataFrame(ds2)
C = np.where(ds1.values.ravel()[:, None] == ds2.values[:, 0])
print C

gives wrong result

(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8, 10, 11, 12, 13, 14]), 
 array([ 0, 7, 2, 5, 3, 6, 1, 3, 3, 0, 8, 5, 4, 6]))

Expected output is

output = [[1, 2, 1, 3],
          [2, 1, 3, 2],
          [2, 0, 1, 3],
          [3, 2, 1, 0]]

and while working with large matrix values

ds1 = pd.read_table('https://gist.githubusercontent.com/karimkhanp/9527bad750fbe75e072c/raw/ds1', sep=' ', header=None)
ds2 = pd.read_table('https://gist.githubusercontent.com/karimkhanp/1692f1f76718c35e939f/raw/6f6b348ab0879b702e1c3c5e362e9d2062e9e9bc/ds2', header=None, sep=' ')
C = np.where(ds1.values.ravel()[:, None] == ds2.values[:, 0])
print C

it gives

(1000, 1001) (4000, 2)
(array([], dtype=int32),)

instead of the replaced matrix value.

Any suggestion would be much helpful.

Upvotes: 0

Views: 305

Answers (2)

seven7e
seven7e

Reputation: 818

I agree with @Anthony Lethuillier 's answer and I just guess the IndexError may be caused by different version. It seem's in @nlper 's situation, C is (array([], dtype=int32),) which means nothing found in ds1.values.ravel()[:, None] == ds2.values[:, 0], and this is obviously different from @Anthony 's. Nothing found, thus C is a tuple which only contains 1 element, so an IndexError is triggered when you accessing C[1].

This also works on my machine so I don't know why C is empty. I recommend you to print ds1.values.ravel() and ds2.values[:, 0] in detail and see why nothing equals.

Besides, I use python 2.7.9, numpy 1.9.2 and pandas 0.16.1

Upvotes: 2

Anthony Lethuillier
Anthony Lethuillier

Reputation: 1539

The second array in C (array([ 0, 7, 2, 5, 3, 6, 1, 3, 3, 0, 8, 5, 4, 6]) gives you the positions of the values you want to replace in ds1.

So you have to replace the values in ds1.values.ravel() with the index of the first array of C with the values in ds2 with the index of the second array of C

Here is the code that gives the right output for the small example :

import pandas as pd
import numpy as np

ds1 = [[ 4, 13,  6,  9],
      [ 7, 12,  5,  7],
      [ 7,  0,  4, 22],
      [ 9,  8, 12,  0]]

ds2 = [[ 4,  1],
       [ 5,  3],
       [ 6,  1],
       [ 7,  2],
       [ 8,  2],
       [ 9,  3],
       [12,  1],
       [13,  2],
       [22,  3]]

ds1= pd.DataFrame(ds1)
ds2= pd.DataFrame(ds2)

C = np.where(ds1.values.ravel()[:, None] == ds2.values[:, 0])

ds1_new = ds1.values.ravel()

ds1_new[C[0]]=ds2.values[C[1], 1]

ds1_new = ds1_new.reshape(4,4)

print(ds1_new)

ds1 = pd.read_table('https://gist.githubusercontent.com/karimkhanp/9527bad750fbe75e072c/raw/ds1', sep=' ', header=None)
ds2 = pd.read_table('https://gist.githubusercontent.com/karimkhanp/1692f1f76718c35e939f/raw/6f6b348ab0879b702e1c3c5e362e9d2062e9e9bc/ds2', header=None, sep=' ')

C = np.where(ds1.values.ravel()[:, None] == ds2.values[:, 0])

ds1_new = ds1.values.ravel()

ds1_new[C[0]]=ds2.values[C[1], 1]

ds1_new = ds1_new.reshape(1000,1001)

print(ds1_new)

Gives the following output :

[[1 2 1 3]
 [2 1 3 2]
 [2 0 1 3]
 [3 2 1 0]]
[[  1.   1.   1. ...,   1.   1.  nan]
 [  1.   1.   1. ...,   0.   1.  nan]
 [  1.   0.   1. ...,   1.   0.  nan]
 ..., 
 [  1.   1.   1. ...,   0.   1.  nan]
 [  1.   0.   1. ...,   1.   1.  nan]
 [  1.   1.   1. ...,   0.   1.  nan]]

Upvotes: 1

Related Questions