Pandas: Selecting row-range and column on a filtered dataframe

Question

Lets say I have data like this:

df = pd.DataFrame({'category': ["blue","blue","blue", "blue","green"], 'val1': [5, 3, 2, 2, 5], 'val2':[1, 3, 2, 2, 5]})
print(df)

  category  val1  val2
0     blue     5     1
1     blue     3     3
2     blue     2     2
3     blue     2     2
4    green     5     5

I want filter by category, then select a column and a row-range, like this:

print(df.loc[df['category'] == 'blue'].loc[1:2, 'val1'])

1    3
2    2
Name: val1, dtype: int64

This works for selecting the data I am interested in, but when I try to overwrite part of my dataframe with the above-selected data, I get A value is trying to be set on a copy of a slice from a DataFrame.

I am familiar with this error message and I know it occurs when trying to overwrite something with a dataframe that was selected like df.loc[columns].loc[rows] instead of df.loc[columns, rows].

However, I can't figure out how to put all 3 things I am filtering for (a certain value for category, a certain column and a certain row range) into a single .loc[...]. How can I select the part of the data in a way that I can use it to overwrite part of the dataframe?

cs95 · Accepted Answer

This makes sense because you are chaining two loc calls. My suggestion is to squash the two loc calls together. You can do this by filtering, then grabbing the index and to use in another loc:

df.loc[df[df['category'].eq('blue')].index[1:3], 'val1'] = 123

Notice I have to use df.index[1:3] instead of df.index[1:2] because the end range is not inclusive for positional slicing (unlike loc which is label-based slicing).

Pandas: Selecting row-range and column on a filtered dataframe

Answers (1)

Related Questions