Reputation: 3137
I have a simple code to find similar rows in a dataset.
h=0
count=0
#227690
deletedIndexes=np.zeros((143,))
len(data)
for i in np.arange(len(data)):
if(data[i-1,2]==data[i,2]):
similarIndexes[h]=int(i)
h=h+1
count=count+1
print("similar found in -->", i," there are--->", count)
It works correctly when data is a numpy.ndarray But if data is a panda object, i give the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 7, in smilarData
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1658, in __getitem__
return self._getitem_column(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1665, in _getitem_column
return self._get_item_cache(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1005, in _get_item_cache
values = self._data.get(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 2874, in get
_, block = self._find_block(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3186, in _find_block
self._check_have(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3193, in _check_have
raise KeyError('no item named %s' % com.pprint_thing(item))
KeyError: u'no item named (-1, 2)'
What should i do to use this code? If converting pandas object to numpy array is helpful, how can i do that?
Upvotes: 2
Views: 6412
Reputation: 2765
I subscribe to the previous answers but in case you want to work directly with pandas
objects, accessing DataFrame items has its own special way. In your code you should say e.g.
if(data.iloc[i-1,2]==data.iloc[i,2]):
See the documentation for more
Upvotes: 0
Reputation: 3959
I can not comment yet to Adrienne's answer so I would like to add that dataframes have built in method to convert df to array i.e. matrix
>>> df = pd.DataFrame({"a":range(5),"b":range(5,10)})
>>> df
a b
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> mat = df.as_matrix()
array([[0, 5],
[1, 6],
[2, 7],
[3, 8],
[4, 9]])
>>>col = [x[0] for x in mat] # to get certain columns
>>> col
[0, 1, 2, 3, 4]
also to find duplicated rows you can do:
>>> df2
a b
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
5 0 5
>>> df2[df2.duplicated()]
a b
5 0 5
Upvotes: 1
Reputation: 183
To convert a pandas dataframe to a numpy array:
import numpy as np
np.array(dataFrame)
Upvotes: 1