Reputation: 5795
I get a NameError exception when I try to filter DataFrame by selected index values (inside IPython session). You can see that valid
is numpy.array
while lab
is a pandas.DataFrame
object. Both of them are initialized and accessible. However I cannot put them together. Here is the error:
In [51]: valid
Out[51]:
array([38661, 44593, 38705, 38918, 38727, 38757, 38751, 38777, 38787,
...,
45328, 45337, 43645, 43694, 43701])
In [52]: lab
Out[52]:
0
39333 -1
39173 -1
42756 -1
39633 -1
38661 -1
44801 81
... ...
39379 -1
39742 -1
44765 108
44279 -1
40584 -1
41047 -1
41833 98
[3299 rows x 1 columns]
In [53]: lab[lab.index.map(lambda x: x in valid)]
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/home/vitaly/progs/vnii_gochs/venv/lib/python2.7/site-packages/django/core/management/commands/shell.pyc in <module>()
----> 1 lab[lab.index.map(lambda x: x in valid)]
/home/vitaly/progs/vnii_gochs/venv/lib/python2.7/site-packages/pandas/core/index.pyc in map(self, mapper)
1558
1559 def map(self, mapper):
-> 1560 return self._arrmap(self.values, mapper)
1561
1562 def isin(self, values, level=None):
/home/vitaly/progs/vnii_gochs/venv/lib/python2.7/site-packages/pandas/algos.so in pandas.algos.arrmap_int64 (pandas/algos.c:78469)()
/home/vitaly/progs/vnii_gochs/venv/lib/python2.7/site-packages/django/core/management/commands/shell.pyc in <lambda>(x)
----> 1 lab[lab.index.map(lambda x: x in valid)]
NameError: global name 'valid' is not defined
What's wrong with this code?
UPD: lab.pkl (pickle format), valid.npy (numpy binary format)
Upvotes: 0
Views: 89
Reputation: 1897
It is not clear if you trying to add a new column to lab
, or if you are trying to get values in the order specified in the valid
array. To add a new column to lab
you can do lab['new'] = valid
. To get a Series object ordered according to the values in the valid
array you can do lab.loc[value]
. If you just want the raw numpy array do lab.loc[value].values
Upvotes: 1