Reputation: 28565
I have hundreds of thousands of rows that look something like this (there's actually more data than just this, but I'm trying to simplify the idea I've been attempting)...
index status location
0 infected area5
1 healthy area6
2 healthy area3
3 infected area8
4 healthy area1
5 healthy area8
6 healthy area5
7 healthy area2
8 healthy area4
9 healthy area10
10 .... ....
I'm trying to update the status
column, based on if an area is infected. So I basically made a list of the infected areas:
infected_areas = ['area5', 'area8']
Then what I'm trying to do is look at all the rows (or really just the 'healthy' rows), and if any of those match to what is in my infected_areas
list, to change that status
to infected.
So with my example above, the output should look like:
index status location
0 infected area5
1 healthy area6
2 healthy area3
3 infected area8
4 healthy area1
5 infected area8
6 infected area5
7 healthy area2
8 healthy area4
9 healthy area10
10 .... ....
here's what I've been working with, but not quite getting anywhere:
`df[df['location'].isin(location)]['status'] = 'infected'
Upvotes: 1
Views: 286
Reputation: 323226
Just using .loc
df.loc[df.location.isin(infected_areas),'status']='infected'
df
Out[49]:
index status location
0 0 infected area5
1 1 healthy area6
2 2 healthy area3
3 3 infected area8
4 4 healthy area1
5 5 infected area8
6 6 infected area5
7 7 healthy area2
8 8 healthy area4
9 9 healthy area10
Upvotes: 4
Reputation: 76297
You can use pd.Series.isin
in conjunction with pd.Series.where
:
infected_areas = ['area5', 'area8']
df.status.where(
~df.location.str.strip().isin(infected_areas),
other='infected',
inplace=True)
>>> df
index status location
0 0 infected area5
1 1 healthy area6
2 2 healthy area3
3 3 infected area8
4 4 healthy area1
5 5 infected area8
6 6 infected area5
7 7 healthy area2
8 8 healthy area4
9 9 healthy area10
Upvotes: 3