Jui Sen
Jui Sen

Reputation: 377

Generating new column based on another column

With a data frame:

df = pd.DataFrame({"ID": [1, 1, 1,1,2,2, 2, 2,3,3,3,3] ,"year": [2004, 2005, 2006, 2007, 2004, 2005, 2006, 2007,2004, 2005, 2006, 2007], "dummy": [0,1,1,0,1,1,0,0,0,0,0,0]})

ID  year  dummy 
1   2004   0
1   2005   1
1   2006   1
1   2007   0
2   2004   1
2   2005   1
2   2006   0
2   2007   0
3   2004   0
3   2005   0
3   2006   0
3   2007   0

I want to generate a new column: 'dummy1' from 'dummy' column

ID  year  dummy  dummy1
1   2004   0       0
1   2005   1       1
1   2006   1       0
1   2007   0       0
2   2004   1       1
2   2005   1       0
2   2006   0       0
2   2007   0       0
3   2004   0       0
3   2005   0       0
3   2006   0       0
3   2007   0       0

If I use:

df['dummy1'] = df["dummy"].replace({"1": "0"}, inplace=False)

It replaces all the one to zero. How can I modify the code to replace only the second 1 to 0, for each ID?

Upvotes: 1

Views: 104

Answers (4)

Corralien
Corralien

Reputation: 120401

Without groupby, using drop_duplicates

df['dummy1'] = 0
df.loc[df[df['dummy'].eq(1)].drop_duplicates('ID').index, 'dummy1'] = 1

Output:

>>> df
    ID  year  dummy  dummy1
0    1  2004      0       0
1    1  2005      1       1
2    1  2006      1       0
3    1  2007      0       0
4    2  2004      1       1
5    2  2005      1       0
6    2  2006      0       0
7    2  2007      0       0
8    3  2004      0       0
9    3  2005      0       0
10   3  2006      0       0
11   3  2007      0       0

Upvotes: 1

Jui Sen
Jui Sen

Reputation: 377

df['dum'] = df.groupby('ID')['dummy'].cummax()

Then,

conditions = [
    (df['dum']== 0),
    (df['dum']== 1)
    ]
values = [0, (df.index == df.groupby('ID').dummy.transform('idxmax')).astype(int)]
df['dummy1'] = np.select(conditions, values)

ID  year  dummy  dummy1
1   2004   0       0
1   2005   1       1
1   2006   1       0
1   2007   0       0
2   2004   1       1
2   2005   1       0
2   2006   0       0
2   2007   0       0
3   2004   0       0
3   2005   0       0
3   2006   0       0
3   2007   0       0

Upvotes: 0

BENY
BENY

Reputation: 323226

Try with idxmax with transform

df['new'] = (df.index == df.groupby('ID').dummy.transform('idxmax')).astype(int)
df
   ID  year  dummy  new
0   1  2004      0    0
1   1  2005      1    1
2   1  2006      1    0
3   1  2007      0    0
4   2  2004      1    1
5   2  2005      1    0
6   2  2006      0    0
7   2  2007      0    0

Upvotes: 1

ThePyGuy
ThePyGuy

Reputation: 18406

Combine dummy value equals 1 and dummy value shifted by -1 equals 1 by and, then convert it integer type, finally assign it to the new column:

>>> df['dummy1'] = ((df['dummy'].eq(1)) & (df['dummy'].shift(-1).eq(1))).astype(int)

   ID  year  dummy  dummy1
0   1  2004      0       0
1   1  2005      1       1
2   1  2006      1       0
3   1  2007      0       0
4   2  2004      1       1
5   2  2005      1       0
6   2  2006      0       0
7   2  2007      0       0

Upvotes: 1

Related Questions