Reputation: 1690
Problem is simplified:
I need to extract and modify particular rows of a DataFrame
based on whether or not the text within a column has a '-' character. The dash and everything beyond needs to be removed and the remaining text needs to be whatever was preceding the '-'.
have:
textcol
0 no dash here
1 one - here
want:
textcol
0 one
here is the code used to recreate my scenario.
df = pd.DataFrame(data=['no dash here', 'one - here'], index=[0, 1], columns=['textcol'])
df2 = df[df['textcol'].str.contains('-') == True]
df2.loc[:, ['textcol']] = df2['textcol'].str.split('-').str[0]
The resulting DataFrame
df2 yields the result that I desire, with one exception. Every time I call df2 (or any derivative thereafter) I receive the following SettingWithCopyWarning
:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
I tried to accomplish what I wanted a different way, and was given a similar error that instructed me to try and use the .loc()
functionality instead, but I'm still receiving this similar error.
Is there a better, non-error threatening way for me to accomplish this result? I'm afraid something is occurring here that I don't understand and that eventually df2 will not result in what I want. I am also wondering if something like .query()
would work.
Upvotes: 2
Views: 1643
Reputation: 42875
As mentioned by @EdChum, df2
is a view
on df
as opposed to a copy
. If you want a copy
, you can use .copy()
(see docs) and the SettingWithCopyWarning
disappears:
df2 = df[df['textcol'].str.contains('-') == True].copy()
df2.loc[:, ['textcol']] = df2['textcol'].str.split('-').str[0]
See returning a view vs copy in the pandas
docs.
Upvotes: 6