Reputation: 1601
I have the following line of code:
# slice off the last 4 chars in name wherever its code contains the substring '-CUT'
df['name'] = np.where(df['code'].str.contains('-CUT'),
df['name'].str[:-4], df['name'])
However, this doesn't seem to be working correctly. It's slicing off the last 4 characters for the correct columns, but it's also doing it for rows where the code is None/empty (almost all instances).
Is there anything obviously wrong with how I'm using np.where?
Upvotes: 2
Views: 170
Reputation: 164623
You can specify regex=False
and na=False
as parameters to pd.Series.str.contains
so that only rows where your condition is met are updated:
df['name'] = np.where(df['code'].str.contains('-CUT', regex=False, na=False),
df['name'].str[:-4], df['name'])
regex=False
isn't strictly necessary for this criterion, but it should improve performance. na=False
ensures any type which cannot be processed via str
methods returns False
.
Alternatively, you can use pd.DataFrame.loc
. This seems more natural than specifying an "unchanged" series as a final argument to np.where
:
mask = df['code'].str.contains('-CUT', regex=False, na=False)
df.loc[mask, 'name'] = df['name'].str[:-4]
Upvotes: 5