Reputation: 93
I have a pandas data frame with thousands of rows and 4 columns. i.e.:
A B C D
1 1 2 0
3 3 2 1
3 1 1 0
....
Is there any way to count how many times a certain row occurs? For example how many times can [3,1,1,0] be found, and return the indices of those rows?
Upvotes: 2
Views: 2437
Reputation: 97291
You can also use MultiIndex, when it's sorted, it is faster to find the count:
s = StringIO("""A B C D
1 1 2 0
3 3 2 1
3 1 1 0
3 1 1 0
3 3 2 1
1 2 3 4""")
df = pd.read_table(s,delim_whitespace=True)
s = pd.Series(range(len(df)), index=pd.MultiIndex.from_arrays(df.values.T))
s = s.sort_index()
idx = s[3,1,1,0]
print idx.count(), idx.values
output:
2 [2 3]
Upvotes: 1
Reputation: 353059
If you're only looking for one row, then I might do something like
>>> df.index[(df == [3, 1, 1, 0]).all(axis=1)]
Int64Index([2, 3], dtype=int64)
--
Explanation follows. Starting from:
>>> df
A B C D
0 1 1 2 0
1 3 3 2 1
2 3 1 1 0
3 3 1 1 0
4 3 3 2 1
5 1 2 3 4
We compare against our target:
>>> df == [3,1,1,0]
A B C D
0 False True False True
1 True False False False
2 True True True True
3 True True True True
4 True False False False
5 False False False False
Find the ones which match:
>>> (df == [3,1,1,0]).all(axis=1)
0 False
1 False
2 True
3 True
4 False
5 False
And use this boolean Series to select from the index:
>>> df.index[(df == [3,1,1,0]).all(axis=1)]
Int64Index([2, 3], dtype=int64)
If you're not counting occurrences of one row, but instead you want to do this repeatedly for each row and so you really want to simultaneously locate all the rows, there are much faster ways than doing the above again and again. But this should work well enough for one row.
Upvotes: 4
Reputation: 67073
First create a sample array:
>>> import numpy as np
>>> x = [[1, 1, 2, 0],
... [3, 3, 2, 1],
... [3, 1, 1, 0],
... [0, 1, 2, 3],
... [3, 1, 1, 0]]
Then create a view of the array where each row is a single element:
>>> y = x.view([('', x.dtype)] * x.shape[1])
>>> y
array([[(1, 1, 2, 0)],
[(3, 3, 2, 1)],
[(3, 1, 1, 0)],
[(0, 1, 2, 3)],
[(3, 1, 1, 0)]],
dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<i8')])
Do the same thing with the element you want to find:
>>> e = np.array([[3, 1, 1, 0]])
>>> tofind = e.view([('', e.dtype)] * e.shape[1])
And now you can look for the element:
>>> y == tofind[0]
array([[False],
[False],
[ True],
[False],
[ True]], dtype=bool)
Upvotes: 1