Reputation: 541
I have a data frame as follows(sample shown below)
Date event A B C D
2015-10-01 A 1 0 0 0
2015-10-02 A|B 1 1 0 0
2015-11-01 C 0 0 1 0
2015-10-07 D 0 0 0 1
Here A indicates only event A occurred and A|B indicates both A and B occurred on a particular day. Now I have already formed dummy variables for the event as indicated above. Subsequently I want to club B,C and D into one event called 'minor'. So my output would look like:
Date event A Minor
2015-10-01 A 1 0
2015-10-02 A|B 1 1
2015-11-01 C 0 1
2015-10-07 D 0 1
As you can see for second row,Minor is equal to 1 as B also occurs on that day. I am little confused how to approach this. In actual I have 20+ columns which need to be grouped into minor.Any help would be appreciated.
Upvotes: 1
Views: 50
Reputation: 862451
I think you need DataFrame.max
if need only 0
and 1
output:
cols = ['B','C','D']
df['Minor'] = df[cols].max(axis=1)
df = df.drop(cols, axis=1)
print (df)
Date event A Minor
0 2015-10-01 A 1 0
1 2015-10-02 A|B 1 1
2 2015-11-01 C 0 1
3 2015-10-07 D 0 1
Or if need sum values use DataFrame.sum
:
df['Minor'] = df[cols].sum(axis=1)
df = df.drop(cols, axis=1)
print (df)
Date event A Minor
0 2015-10-01 A 1 0
1 2015-10-02 A|B 1 1
2 2015-11-01 C 0 1
3 2015-10-07 D 0 1
Difference in changed df
:
print (df)
Date event A B C D
0 2015-10-01 A 1 0 0 0
1 2015-10-02 A|B 1 1 0 0
2 2015-11-01 C|D 0 0 1 1
3 2015-10-07 D 0 0 0 1
cols = ['B','C','D']
df['Minor'] = df[cols].sum(axis=1)
df = df.drop(cols, axis=1)
print (df)
Date event A Minor
0 2015-10-01 A 1 0
1 2015-10-02 A|B 1 1
2 2015-11-01 C|D 0 2
3 2015-10-07 D 0 1
df['Minor'] = df[cols].max(axis=1)
df = df.drop(cols, axis=1)
print (df)
Date event A Minor
0 2015-10-01 A 1 0
1 2015-10-02 A|B 1 1
2 2015-11-01 C|D 0 1
3 2015-10-07 D 0 1
Upvotes: 1
Reputation: 294218
Using assign
+ drop
cols = list('BCD')
df.assign(Minor=df[cols].sum(1)).drop(cols, 1)
Date event A Minor
0 2015-10-01 A 1 0
1 2015-10-02 A|B 1 1
2 2015-11-01 C 0 1
3 2015-10-07 D 0 1
Upvotes: 2