Talia
Talia

Reputation: 3137

Converting panda object to numpy array

I have a simple code to find similar rows in a dataset.

 h=0
count=0
#227690
deletedIndexes=np.zeros((143,))
len(data)
for i in np.arange(len(data)):
    if(data[i-1,2]==data[i,2]):
        similarIndexes[h]=int(i)
        h=h+1        
        count=count+1
        print("similar found in -->", i," there are--->", count)

It works correctly when data is a numpy.ndarray But if data is a panda object, i give the following error:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
 File "<stdin>", line 7, in smilarData
  File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1658, in __getitem__
return self._getitem_column(key)
  File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1665, in _getitem_column

return self._get_item_cache(key)

File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1005, in _get_item_cache
values = self._data.get(item)



File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 2874, in get
_, block = self._find_block(item)



File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3186, in _find_block
self._check_have(item)



 File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3193, in _check_have


 raise KeyError('no item named %s' % com.pprint_thing(item))
KeyError: u'no item named (-1, 2)'

What should i do to use this code? If converting pandas object to numpy array is helpful, how can i do that?

Upvotes: 2

Views: 6412

Answers (3)

Yannis P.
Yannis P.

Reputation: 2765

I subscribe to the previous answers but in case you want to work directly with pandas objects, accessing DataFrame items has its own special way. In your code you should say e.g.

if(data.iloc[i-1,2]==data.iloc[i,2]):

See the documentation for more

Upvotes: 0

redacted
redacted

Reputation: 3959

I can not comment yet to Adrienne's answer so I would like to add that dataframes have built in method to convert df to array i.e. matrix

>>> df = pd.DataFrame({"a":range(5),"b":range(5,10)})
>>> df
   a  b
0  0  5
1  1  6
2  2  7
3  3  8
4  4  9
>>> mat = df.as_matrix()
array([[0, 5],
       [1, 6],
       [2, 7],
       [3, 8],
       [4, 9]])
>>>col = [x[0] for x in mat] # to get certain columns
>>> col
[0, 1, 2, 3, 4]

also to find duplicated rows you can do:

>>> df2
   a  b
0  0  5
1  1  6
2  2  7
3  3  8
4  4  9
5  0  5
>>> df2[df2.duplicated()]
   a  b
5  0  5

Upvotes: 1

Adrienne
Adrienne

Reputation: 183

To convert a pandas dataframe to a numpy array:

import numpy as np
np.array(dataFrame)

Upvotes: 1

Related Questions