JohnE
JohnE

Reputation: 30444

Setting values on a subset of rows (indexing, boolean setting)

This is a followup to:

What is correct syntax to swap column values for selected rows in a pandas data frame using just one line?

No credit will be given for a workaround, only for an explanation. There are already three workarounds provided as answers to the original question.

There are 3 examples of setting values below. How can you explain different results? Why do #1 and #2 fail in different ways while #3 works fine? Is this behavior buggy or actually the way things should work?

df = DataFrame({'L':   ['left', 'right', 'left', 'right'],
                'R':   ['right', 'left', 'right', 'left'],
                'idx': [False, True, False, True],
                'num': np.arange(4) })

df1 = df.copy()
df2 = df.copy()
df3 = df.copy()

#1 nothing happens
df1.loc[df1.idx,['L','R']] = df1.loc[df1.idx,['R','L']]

#2 weird results
df2.loc[df2.idx,['L','R']] = df2[['R','L']]

#3 similar to #2, works fine
df3.loc[df3.idx,['L','R']] = df3['num']

data before and after:

df
       L      R    idx  num
0   left  right  False    0
1  right   left   True    1
2   left  right  False    2
3  right   left   True    3

df1
       L      R    idx  num
0   left  right  False    0
1  right   left   True    1
2   left  right  False    2
3  right   left   True    3

df2
       L      R    idx  num
0   left  right  False    0
1   left  right   True    1
2   left  right  False    2
3  right   left   True    3

df3
       L      R    idx  num
0   left  right  False    0
1      1      1   True    1
2   left  right  False    2
3      3      3   True    3

Upvotes: 0

Views: 361

Answers (1)

Jeff
Jeff

Reputation: 129018

Pandas aligns the right-hand side of a setting operation. Then take the left-hand side mask and sets them equal.

So this is the left hand indexer. So you are going to be making the rhs this same shape (or broadcastable to it).

In [61]: df1.loc[df1.idx,['L','R']] 
Out[61]: 
       L     R
1  right  left
3  right  left

Here is the first. I am only going to show the right-hand alignment (the y).

In [49]: x, y = df1.align(df1.loc[df1.idx,['L','R']])

In [51]: y
Out[51]: 
       L     R  idx  num
0    NaN   NaN  NaN  NaN
1  right  left  NaN  NaN
2    NaN   NaN  NaN  NaN
3  right  left  NaN  NaN

So even though you reversed the columns in the input on the right hand side, aligning put them back in order. So you are setting the same values, hence no change.

In [63]: x, y = df2.align(df2[['R','L']])

In [65]: y
Out[65]: 
       L      R  idx  num
0   left  right  NaN  NaN
1  right   left  NaN  NaN
2   left  right  NaN  NaN
3  right   left  NaN  NaN

Notice the difference from the above. This is still a full frame (and not sub-selected, so the shape of the right hand side is now != to the left shape, as opposed to the above example).

Their is a reindexing step at this point. It might be a bug as I think this should come out the same as the your first example. Pls file a bug report for this (e.g. your example using df1 and df2). They should come out == df after the assignment.

In [58]: x, y = df1.align(df3['num'],axis=0)

In [60]: y
Out[60]: 
0    0
1    1
2    2
3    3
Name: num, dtype: int64

This one simply broadcasts the results to the left-hand side. that's why the numbers are propogated.

Bottom line. Pandas attempts to figure out the right hand side in the assignment. Their are a lot of cases.

Upvotes: 2

Related Questions