Grouping where column is different

Question

I have a data-frame (df) which looks like:

df = {'quarter':['Q1','Q1','Q1','Q2','Q2','Q2','Q3','Q3','Q3','Q4','Q4','Q4'],
         'id':[1,2,3,1,2,3,1,2,3,1,2,3],
         'score':['DD','DD','DD','D','DD','DD','D','D','D','D','D','D']

         }



   quarter  id   score
0       Q1   1      DD
1       Q1   2      DD
2       Q1   3      DD
3       Q2   1       D
4       Q2   2      DD
5       Q2   3      DD
6       Q3   1       D
7       Q3   2       D
8       Q3   3       D
9       Q4   1       D
10      Q4   2       D
11      Q4   3       D

I am trying to count by id for each quarter where the score is different between quarters. For example an id would get counted in for Q2 if the score for that id in Q1 was DD and the score for that id in Q2 was D. So in the end I have an output that looks like:

   count
Q1    
Q2    1
Q3    2
Q4    0

There is no count value for Q1 as there was no previous quarter to compare.

I have tried groupby but can't work in the previous quarters score for a specific id.

df.groupby(['quarter','id']).size().reset_index().groupby('Quarter').count()

jezrael · Accepted Answer

First reshape by pivot, compare shifted values with ne for not equal and count Trues values by sum, last set first value to NaN:

df1 = df.pivot('quarter','id','score')
s = df1.shift().ne(df1).sum(axis=1).astype(float)
s.iat[0] = np.nan
print (s)
Q1    NaN
Q2    1.0
Q3    2.0
Q4    0.0
dtype: float64

Grouping where column is different

Answers (1)

Related Questions