AnnetteC
AnnetteC

Reputation: 506

Pandas: Get SettingWithCopyWarning when using set_categories

I have two data frame. Both have the same set of columns but some columns are categorical typed (based on the actual containing values). In order to combine them I refresh the categorical type of the categorical columns with the union of both values.

def appendDFsWithCat(df1, df2):
    columns = df1.select_dtypes(include=['category']).columns
    for c in columns:
        catValues1 = list(df1[c].cat.categories)
        catValues2 = list(df2[c].cat.categories)
        catValues = list(set(catValues1 + catValues2))
        df1[c] = df1[c].cat.set_categories(catValues)
        df2[c] = df2[c].cat.set_categories(catValues)
    return df1.append(df2, ignore_index=True).reset_index(drop=True)

Everything works like expected but I would like to understand why a SettingWithCopyWarning is raising when executing this code:

df1[c] = df1[c].cat.set_categories(catValues)
Utility.py:149: SettingWithCopyWarning:

I found no other possibility to refresh the category data than the used one.

Upvotes: 1

Views: 1310

Answers (1)

piRSquared
piRSquared

Reputation: 294506

This is most likely happening because of the objects you are passing to your function.

If I set up the following example:

cats1 = pd.Series(['a', 'a', 'b', 'b'], name='cat', dtype="category")
data1 = pd.Series([1, 2, 3, 4], name='val', dtype=np.int64)
df1 = pd.concat([cats1, data1], axis=1)

and run your function:

print appendDFsWithCat(df1, df1)

I get no error and this output:

  cat  val
0   a    1
1   a    2
2   b    3
3   b    4
4   a    1
5   a    2
6   b    3
7   b    4

However, if I run this:

print appendDFsWithCat(df1.iloc[:-1], df1)

I get the following warning:

C:\Anaconda2\lib\site-packages\ipykernel\__main__.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

And this output:

  cat  val
0   a    1
1   a    2
2   b    3
3   a    1
4   a    2
5   b    3
6   b    4

If you read the warning, it tells you that you are trying to set values on an object that is a slice, or view, of another object. That means that the dataframe you are assigning values to at specific locations is only a reference to another object. I manufactured this situation by passing a dataframe to the function that I knew was a slice or view.

You can get around this by forcing the objects to be their own things like this:

Solution


def appendDFsWithCat(df1, df2):
    # I added this line to ensure they are their own dataframes
    df1, df2 = df1.copy(), df2.copy()
    columns = df1.select_dtypes(include=['category']).columns
    for c in columns:
        catValues1 = list(df1[c].cat.categories)
        catValues2 = list(df2[c].cat.categories)
        catValues = list(set(catValues1 + catValues2))
        df1[c] = df1[c].cat.set_categories(catValues)
        df2[c] = df2[c].cat.set_categories(catValues)
    return df1.append(df2, ignore_index=True).reset_index(drop=True)

Now when I run:

print appendDFsWithCat(df1.iloc[:-1], df1)

I get:

  cat  val
0   a    1
1   a    2
2   b    3
3   a    1
4   a    2
5   b    3
6   b    4

With now warnings.

Upvotes: 1

Related Questions