Vitaly Isaev
Vitaly Isaev

Reputation: 5795

Variable visibility issue with Pandas and IPython

I get a NameError exception when I try to filter DataFrame by selected index values (inside IPython session). You can see that valid is numpy.array while lab is a pandas.DataFrame object. Both of them are initialized and accessible. However I cannot put them together. Here is the error:

In [51]: valid
Out[51]: 
array([38661, 44593, 38705, 38918, 38727, 38757, 38751, 38777, 38787,
       ...,    
       45328, 45337, 43645, 43694, 43701])

In [52]: lab
Out[52]: 
         0
39333   -1
39173   -1
42756   -1
39633   -1
38661   -1
44801   81
...    ...
39379   -1
39742   -1
44765  108
44279   -1
40584   -1
41047   -1
41833   98

[3299 rows x 1 columns]

In [53]: lab[lab.index.map(lambda x: x in valid)]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/home/vitaly/progs/vnii_gochs/venv/lib/python2.7/site-packages/django/core/management/commands/shell.pyc in <module>()
----> 1 lab[lab.index.map(lambda x: x in valid)]

/home/vitaly/progs/vnii_gochs/venv/lib/python2.7/site-packages/pandas/core/index.pyc in map(self, mapper)
   1558 
   1559     def map(self, mapper):
-> 1560         return self._arrmap(self.values, mapper)
   1561 
   1562     def isin(self, values, level=None):

/home/vitaly/progs/vnii_gochs/venv/lib/python2.7/site-packages/pandas/algos.so in pandas.algos.arrmap_int64 (pandas/algos.c:78469)()

/home/vitaly/progs/vnii_gochs/venv/lib/python2.7/site-packages/django/core/management/commands/shell.pyc in <lambda>(x)
----> 1 lab[lab.index.map(lambda x: x in valid)]

NameError: global name 'valid' is not defined

What's wrong with this code?

UPD: lab.pkl (pickle format), valid.npy (numpy binary format)

Upvotes: 0

Views: 89

Answers (1)

Karthik V
Karthik V

Reputation: 1897

It is not clear if you trying to add a new column to lab, or if you are trying to get values in the order specified in the valid array. To add a new column to lab you can do lab['new'] = valid. To get a Series object ordered according to the values in the valid array you can do lab.loc[value]. If you just want the raw numpy array do lab.loc[value].values

Upvotes: 1

Related Questions