Winawer
Winawer

Reputation: 711

Pandas copying using iloc not working as expected

I'm relatively new to pandas, so I expect that I've simply not grasped it well enough yet. I've been trying to make a copy of a dataframe and I need to reorder the rows as I do according to an externally supplied mapping (there's a good but irrelevant reason for setting df2 to nan). When I try to do it as one operation using .iloc, the ordering is ignored, but if I loop and do it one row at a time, it works as I expected it to. Can anyone explain where I'm going wrong in this MWE? (Also, more efficient / elegant ways of doing this are welcome).

import pandas as pd
import numpy as np

df1 = pd.DataFrame([[100,200,300,400]]).T
df1.columns = ['A']

df2 = df1.copy()
df2[:] = np.nan

assign = np.array([[0,0],[1,1],[3,2],[2,3]])

print df1

# This does not work:
# df2.iloc[assign[:,1]] = df1.iloc[assign[:,0]]
# Output:
#      A
# 0  100
# 1  200
# 2  300
# 3  400
#
#      A
# 0  100
# 1  200
# 2  300
# 3  400

# This does:
for x in assign:
  df2.iloc[x[1]] = df1.iloc[x[0]]
# Output:
#      A
# 0  100
# 1  200
# 2  300
# 3  400
#
#      A
# 0  100
# 1  200
# 2  400
# 3  300

print df2

Upvotes: 1

Views: 6220

Answers (1)

CT Zhu
CT Zhu

Reputation: 54340

We will need a pandas developer here to explain why this is the way how it works, but I do know that the following solution will get you there (pandas 0.13.1):

In [179]:
df2.iloc[assign[:,1]] = df1.iloc[assign[:,0]].values
print df2

out[179]:
     A
0  100
1  200
2  400
3  300

[4 rows x 1 columns] 

As @Jeff pointed out, in df2.iloc[assign[:,1]] = df1.iloc[assign[:,0]], you are assigning a Series to a Series, and two indices will match up. But with df2.iloc[assign[:,1]] = df1.iloc[assign[:,0]].values, you are assigning a array to an Series and there are no index to be matched.

Also consider this following example, as an illustration of the index match behavior.

In [208]:
#this will work and there will be missing values
df1['B']=pd.Series({0:'a', 3:'b', 2:'c'})
print df1
     A    B
0  100    a
1  200  NaN
2  300    c
3  400    b

[4 rows x 2 columns]
In [209]:
#this won't work
df1['B']=['a', 'b', 'c'] #one element less than df1
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

Upvotes: 2

Related Questions