Reputation: 340
I have a data frame in pandas python as following data frame
<table style="width:100%">
<tr>
<th>ID</th>
<th>AGE</th>
<th>GENDER</th>
<th>TIME</th>
<th>CODE</th>
</tr>
<tr>
<td>1</td>
<td>66</td>
<td>M</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>66</td>
<td>M</td>
<td>2</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>66</td>
<td>M</td>
<td>3</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>66</td>
<td>M</td>
<td>4</td>
<td>1</td>
</tr>
<tr>
<td>2</td>
<td>20</td>
<td>F</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>20</td>
<td>F</td>
<td>2</td>
<td>0</td>
<tr>
<td>2</td>
<td>20</td>
<td>F</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>2</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>3</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>4</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>5</td>
<td>1</td>
</tr>
</table>
1 66 M 1 1 1 66 M 2 1 1 66 M 3 1 2 20 F 1 0 2 20 F 2 0 2 20 F 3 0 2 20 F 4 0 3 18 F 1 1 3 18 F 2 1 3 18 F 3 1 3 18 F 4 1
and I need to change the last column according to following(wherever the 'CODE' column is 1 keep the last row of That ID as 1 and change the previous rows to zero)
<table style="width:100%">
<tr>
<th>ID</th>
<th>AGE</th>
<th>GENDER</th>
<th>TIME</th>
<th>CODE</th>
</tr>
<tr>
<td>1</td>
<td>66</td>
<td>M</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>66</td>
<td>M</td>
<td>2</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>66</td>
<td>M</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>66</td>
<td>M</td>
<td>4</td>
<td>1</td>
</tr>
<tr>
<td>2</td>
<td>20</td>
<td>F</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>20</td>
<td>F</td>
<td>2</td>
<td>0</td>
<tr>
<td>2</td>
<td>20</td>
<td>F</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>2</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>4</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>5</td>
<td>1</td>
</tr>
</table>
how is it possible to do it using pandas?
after looking it up I found this line of code which removes the last row of each group dfnew = (df.groupby('ID').apply(lambda x: x.iloc[:-1] if len(x)>1 else x))
Thanks in advance
Upvotes: 1
Views: 70
Reputation: 863611
Get index by filtering by 1
and remove dupes by ID
by drop_duplicates
:
i = df[df['CODE'] == 1].drop_duplicates(subset=['ID'], keep='last').index
Set column to 0
first and then replace by i
:
df['CODE'] = 0
df.loc[i, 'CODE'] = 1
Another solution is create boolean mask and convert it to int
s:
m = (df['CODE'] == 1) & ~df['ID'].duplicated(keep='last')
print (m)
0 False
1 False
2 True
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 True
dtype: bool
df['CODE'] = m.astype(int)
print (df)
ID AGE GENDER TIME CODE
0 1 66 M 1 0
1 1 66 M 2 0
2 1 66 M 3 1
3 2 20 F 1 0
4 2 20 F 2 0
5 2 20 F 3 0
6 2 20 F 4 0
7 3 18 F 1 0
8 3 18 F 2 0
9 3 18 F 3 0
10 3 18 F 4 1
Upvotes: 1