Reputation: 195
I have a DataFrame
object df
. And I would like to modify job
column so that all retired people are 1 and rest 0 (like shown here):
df['job'] = df['job'].apply(lambda x: 1 if x == "retired" else 0)
But I get a warning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Why did I get it here though? From what I read it applies to situations where I take a slice of rows and then a column, but here I am just modyfing elements in a row. Is there a better way to do that?
Upvotes: 0
Views: 328
Reputation: 709
I would not suggest using apply here, as in the case of large data frame it could lower your performance.
I would prefer using numpy.select
or numpy.where
.
Upvotes: 0
Reputation: 39840
So here's an example dataframe:
import pandas as pd
import numpy as np
data = {'job':['retired', 'a', 'b', 'retired']}
df = pd.DataFrame(data)
print(df)
job
0 retired
1 a
2 b
3 retired
Now, you can make use of numpy's where
function:
df['job'] = np.where(df['job']=='retired', 1, 0)
print(df)
job
0 1
1 0
2 0
3 1
Upvotes: 0
Reputation: 30920
Use:
df['job']=df['job'].eq('retired').astype(int)
or
df['job']=np.where(df['job'].eq('retired'),1,0)
Upvotes: 2