Mario Arend
Mario Arend

Reputation: 479

Construct new column with first row of a groupby with two columns - Pandas

I am trying to construct a new column that gives a value of 1 if it's the first time that an element of the column "type" has had a specific value of the column "xx", and gives a value of 0 in any other case.

The error message says

ValueError: Wrong number of items passed 0, placement implies 1.

I tried other metodhos, such as nth(0), but it doesn't work either. Any suggestion of how solving this problem is very welcome.

Upvotes: 7

Views: 399

Answers (4)

BENY
BENY

Reputation: 323226

IIUC

idx=df.groupby([df.index.get_level_values(1),df.xx]).head(1).index
df.loc[:,'new']=0
df.loc[idx,'new']=1
df
Out[869]: 
                     xx  new
date       type             
2018-01-01 A     1000.0    1
           B     1000.0    1
2018-02-01 B     1200.0    1
2018-03-01 A      800.0    1
           B      800.0    1
           C      800.0    1
2018-04-01 A     1000.0    0
           B     1000.0    0
2018-05-01 B      800.0    0
2018-06-01 A     1200.0    1
           B     1200.0    0
           C     1200.0    1
2018-07-01 A     1000.0    0
2018-08-01 B     1000.0    0
           C     1000.0    1
2018-09-01 A     1600.0    1
           B     1600.0    1
2018-10-01 C     1000.0    0
           A      800.0    0
           B      800.0    0
2018-11-01 A     1000.0    0
2018-12-01 B     1600.0    0
           C     1600.0    1

Upvotes: 1

piRSquared
piRSquared

Reputation: 294218

duplicated

  • Generate of list of tuples with type and xx
  • Wrap it in a pandas.Series because I want to use the pandas.Series.duplicated method
  • Use numpy.where to choose between 0 and 1

Note: This does not use groupby and consequently should be more efficient.


s = pd.Series([*zip(df.index.get_level_values('type'), df.xx)])
df.assign(id=np.where(s.duplicated(), 0, 1))

                     xx  id
date       type            
2018-01-01 A     1000.0   1
           B     1000.0   1
2018-02-01 B     1200.0   1
2018-03-01 A      800.0   1
           B      800.0   1
           C      800.0   1
2018-04-01 A     1000.0   0
           B     1000.0   0
2018-05-01 B      800.0   0
2018-06-01 A     1200.0   1
           B     1200.0   0
           C     1200.0   1
2018-07-01 A     1000.0   0
2018-08-01 B     1000.0   0
           C     1000.0   1
2018-09-01 A     1600.0   1
           B     1600.0   1
2018-10-01 C     1000.0   0
           A      800.0   0
           B      800.0   0
2018-11-01 A     1000.0   0
2018-12-01 B     1600.0   0
           C     1600.0   1

Upvotes: 4

Yuca
Yuca

Reputation: 6091

Using groupby + cumcount + astype :

df['yy'] = df.reset_index().groupby(['type','xx']).cumcount().eq(0).astype(int).values

Result:

                     xx  yy
date       type            
2018-01-01 A     1000.0   1
           B     1000.0   1
2018-02-01 B     1200.0   1
2018-03-01 A      800.0   1
           B      800.0   1
           C      800.0   1
2018-04-01 A     1000.0   0
           B     1000.0   0
2018-05-01 B      800.0   0
2018-06-01 A     1200.0   1
           B     1200.0   0
           C     1200.0   1
2018-07-01 A     1000.0   0
2018-08-01 B     1000.0   0
           C     1000.0   1
2018-09-01 A     1600.0   1
           B     1600.0   1
2018-10-01 C     1000.0   0
           A      800.0   0
           B      800.0   0
2018-11-01 A     1000.0   0
2018-12-01 B     1600.0   0
           C     1600.0   1

Upvotes: 5

Quang Hoang
Quang Hoang

Reputation: 150735

Try:

df['yy'] = (df.groupby(level=1).xx
              .apply(lambda x: (~x.duplicated()).astype(int))
           )

df['yy']

outputs:

date        type
2018-01-01  A       1
            B       1
2018-02-01  B       1
2018-03-01  A       1
            B       1
            C       1
2018-04-01  A       0
            B       0
2018-05-01  B       0
2018-06-01  A       1
            B       0
            C       1
2018-07-01  A       0
2018-08-01  B       0
            C       1
2018-09-01  A       1
            B       1
2018-10-01  C       0
            A       0
            B       0
2018-11-01  A       0
2018-12-01  B       0
            C       1
Name: yy, dtype: int32

Upvotes: 6

Related Questions