Reputation: 93

Count occurences of a row with pandas in python

I have a pandas data frame with thousands of rows and 4 columns. i.e.:

A B C D 
1 1 2 0
3 3 2 1
3 1 1 0
....

Is there any way to count how many times a certain row occurs? For example how many times can [3,1,1,0] be found, and return the indices of those rows?

Upvotes: 2

Answers (3)

HYRY

Reputation: 97291

You can also use MultiIndex, when it's sorted, it is faster to find the count:

s = StringIO("""A  B  C  D
1  1  2  0
3  3  2  1
3  1  1  0
3  1  1  0
3  3  2  1
1  2  3  4""")
df = pd.read_table(s,delim_whitespace=True)
s = pd.Series(range(len(df)), index=pd.MultiIndex.from_arrays(df.values.T))
s = s.sort_index()
idx = s[3,1,1,0]
print idx.count(), idx.values

output:

2 [2 3]

Upvotes: 1

DSM

Reputation: 353059

If you're only looking for one row, then I might do something like

>>> df.index[(df == [3, 1, 1, 0]).all(axis=1)]
Int64Index([2, 3], dtype=int64)

Explanation follows. Starting from:

>>> df
   A  B  C  D
0  1  1  2  0
1  3  3  2  1
2  3  1  1  0
3  3  1  1  0
4  3  3  2  1
5  1  2  3  4

We compare against our target:

>>> df == [3,1,1,0]
       A      B      C      D
0  False   True  False   True
1   True  False  False  False
2   True   True   True   True
3   True   True   True   True
4   True  False  False  False
5  False  False  False  False

Find the ones which match:

>>> (df == [3,1,1,0]).all(axis=1)
0    False
1    False
2     True
3     True
4    False
5    False

And use this boolean Series to select from the index:

>>> df.index[(df == [3,1,1,0]).all(axis=1)]
Int64Index([2, 3], dtype=int64)

If you're not counting occurrences of one row, but instead you want to do this repeatedly for each row and so you really want to simultaneously locate all the rows, there are much faster ways than doing the above again and again. But this should work well enough for one row.

Upvotes: 4

jterrace

Reputation: 67073

First create a sample array:

>>> import numpy as np
>>> x = [[1, 1, 2, 0],
... [3, 3, 2, 1],
... [3, 1, 1, 0],
... [0, 1, 2, 3],
... [3, 1, 1, 0]]

Then create a view of the array where each row is a single element:

>>> y = x.view([('', x.dtype)] * x.shape[1])
>>> y
array([[(1, 1, 2, 0)],
       [(3, 3, 2, 1)],
       [(3, 1, 1, 0)],
       [(0, 1, 2, 3)],
       [(3, 1, 1, 0)]], 
      dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<i8')])

Do the same thing with the element you want to find:

>>> e = np.array([[3, 1, 1, 0]])
>>> tofind = e.view([('', e.dtype)] * e.shape[1])

And now you can look for the element:

>>> y == tofind[0]
array([[False],
       [False],
       [ True],
       [False],
       [ True]], dtype=bool)

Upvotes: 1

Count occurences of a row with pandas in python

Answers (3)

Related Questions