Kshitij Yadav
Kshitij Yadav

Reputation: 1387

Count Re-occurrence of a value in python

I have a data set which contains something like this:

SNo  Cookie
1       A
2       A
3       A
4       B
5       C
6       D
7       A
8       B
9       D
10      E
11      D
12      A

So lets say we have 5 cookies 'A,B,C,D,E'. Now I want to count if any cookie has reoccurred after a new cookie was encountered. For example, in the above example, cookie A was encountered again at 7th place and then at 12th place also. NOTE We wouldn't count A at 2nd place as it came simultaneously, but at position 7th and 12th we had seen many new cookies before seeing A again, hence we count that instance. So essentially I want something like this:

Sno Cookie  Count
 1     A     2
 2     B     1
 3     C     0
 4     D     2
 5     E     0

Can anyone give me logic or python code behind this?

Upvotes: 1

Views: 95

Answers (3)

piRSquared
piRSquared

Reputation: 294218

pandas.factorize and numpy.bincount

  1. If immediately repeated values are not counted then remove them.
  2. Do a normal counting of values on what's left.
  3. However, that is one more than what is asked for, so subtract one.

  1. factorize
  2. Filter out immediate repeats
  3. bincount
  4. Produce pandas.Series

i, r = pd.factorize(df.Cookie)
mask = np.append(True, i[:-1] != i[1:])
cnts = np.bincount(i[mask]) - 1

pd.Series(cnts, r)

A    2
B    1
C    0
D    2
E    0
dtype: int64

pandas.value_counts

zip cookies with its lagged self, pulling out non repeats

c = df.Cookie.tolist()

pd.value_counts([a for a, b in zip(c, [None] + c) if a != b]).sort_index() - 1

A    2
B    1
C    0
D    2
E    0
dtype: int64

defaultdict

from collections import defaultdict

def count(s):
  d = defaultdict(lambda:-1)
  x = None
  for y in s:
    d[y] += y != x
    x = y

  return pd.Series(d)

count(df.Cookie)

A    2
B    1
C    0
D    2
E    0
dtype: int64

Upvotes: 1

DYZ
DYZ

Reputation: 57033

Start by removing consecutive duplicates, then count the survivers:

no_dups = df[df.Cookie != df.Cookie.shift()] # Borrowed from @sacul
no_dups.groupby('Cookie').count() - 1
#        SNo
#Cookie     
#A         2
#B         1
#C         0
#D         2
#E         0

Upvotes: 2

sacuL
sacuL

Reputation: 51335

One way to do this would be to first get rid of consecutive Cookies, then find where the Cookie has been seen before using duplicated, and finally groupby cookie and get the sum:

no_doubles = df[df.Cookie != df.Cookie.shift()]

no_doubles['dups'] = no_doubles.Cookie.duplicated()

no_doubles.groupby('Cookie').dups.sum()

This gives you:

Cookie
A    2.0
B    1.0
C    0.0
D    2.0
E    0.0
Name: dups, dtype: float64

Upvotes: 3

Related Questions