user2906657
user2906657

Reputation: 541

Data Manipulation on dummy variables python

I have a data frame as follows(sample shown below)

Date       event  A B C D
2015-10-01  A     1 0 0 0
2015-10-02  A|B   1 1 0 0
2015-11-01  C     0 0 1 0 
2015-10-07  D     0 0 0 1

Here A indicates only event A occurred and A|B indicates both A and B occurred on a particular day. Now I have already formed dummy variables for the event as indicated above. Subsequently I want to club B,C and D into one event called 'minor'. So my output would look like:

Date       event  A  Minor
2015-10-01  A     1   0
2015-10-02  A|B   1   1
2015-11-01  C     0   1
2015-10-07  D     0   1

As you can see for second row,Minor is equal to 1 as B also occurs on that day. I am little confused how to approach this. In actual I have 20+ columns which need to be grouped into minor.Any help would be appreciated.

Upvotes: 1

Views: 50

Answers (2)

jezrael
jezrael

Reputation: 862451

I think you need DataFrame.max if need only 0 and 1 output:

cols = ['B','C','D']

df['Minor'] = df[cols].max(axis=1)
df = df.drop(cols, axis=1)
print (df)
         Date event  A  Minor
0  2015-10-01     A  1      0
1  2015-10-02   A|B  1      1
2  2015-11-01     C  0      1
3  2015-10-07     D  0      1

Or if need sum values use DataFrame.sum:

df['Minor'] = df[cols].sum(axis=1)
df = df.drop(cols, axis=1)
print (df)
         Date event  A  Minor
0  2015-10-01     A  1      0
1  2015-10-02   A|B  1      1
2  2015-11-01     C  0      1
3  2015-10-07     D  0      1

Difference in changed df:

print (df)
         Date event  A  B  C  D
0  2015-10-01     A  1  0  0  0
1  2015-10-02   A|B  1  1  0  0
2  2015-11-01   C|D  0  0  1  1
3  2015-10-07     D  0  0  0  1

cols = ['B','C','D']

df['Minor'] = df[cols].sum(axis=1)
df = df.drop(cols, axis=1)
print (df)
         Date event  A  Minor
0  2015-10-01     A  1      0
1  2015-10-02   A|B  1      1
2  2015-11-01   C|D  0      2
3  2015-10-07     D  0      1

df['Minor'] = df[cols].max(axis=1)
df = df.drop(cols, axis=1)
print (df)
         Date event  A  Minor
0  2015-10-01     A  1      0
1  2015-10-02   A|B  1      1
2  2015-11-01   C|D  0      1
3  2015-10-07     D  0      1

Upvotes: 1

piRSquared
piRSquared

Reputation: 294218

Using assign + drop

cols = list('BCD')
df.assign(Minor=df[cols].sum(1)).drop(cols, 1)

        Date event  A  Minor
0 2015-10-01     A  1      0
1 2015-10-02   A|B  1      1
2 2015-11-01     C  0      1
3 2015-10-07     D  0      1

Upvotes: 2

Related Questions