Lisle
Lisle

Reputation: 1690

SettingWithCopyWarning while using .loc

Problem is simplified:

I need to extract and modify particular rows of a DataFrame based on whether or not the text within a column has a '-' character. The dash and everything beyond needs to be removed and the remaining text needs to be whatever was preceding the '-'.

have:
     textcol
0    no dash here
1    one - here

want:
     textcol
0    one

here is the code used to recreate my scenario.

df = pd.DataFrame(data=['no dash here', 'one - here'], index=[0, 1], columns=['textcol'])
df2 = df[df['textcol'].str.contains('-') == True]
df2.loc[:, ['textcol']] = df2['textcol'].str.split('-').str[0]

The resulting DataFrame df2 yields the result that I desire, with one exception. Every time I call df2 (or any derivative thereafter) I receive the following SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

I tried to accomplish what I wanted a different way, and was given a similar error that instructed me to try and use the .loc() functionality instead, but I'm still receiving this similar error.

Is there a better, non-error threatening way for me to accomplish this result? I'm afraid something is occurring here that I don't understand and that eventually df2 will not result in what I want. I am also wondering if something like .query() would work.

Upvotes: 2

Views: 1643

Answers (1)

Stefan
Stefan

Reputation: 42875

As mentioned by @EdChum, df2 is a view on df as opposed to a copy. If you want a copy, you can use .copy() (see docs) and the SettingWithCopyWarning disappears:

df2 = df[df['textcol'].str.contains('-') == True].copy()
df2.loc[:, ['textcol']] = df2['textcol'].str.split('-').str[0]

See returning a view vs copy in the pandas docs.

Upvotes: 6

Related Questions