Quentin
Quentin

Reputation: 446

Set first and last row of a column in a dataframe

I've been reading over this and still find the subject a little confusing : http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Say I have a Pandas DataFrame and I wish to simultaneously set the first and last row elements of a single column to whatever value. I can do this :

df.iloc[[0, -1]].mycol = [1, 2]

which tells me A value is trying to be set on a copy of a slice from a DataFrame. and that this is potentially dangerous.

I could use .loc instead, but then I need to know the index of the first and last rows ( in constrast, .iloc allows me to access by location ).

What's the safest Pandasy way to do this ?

To get to this point :

# Django queryset
query = market.stats_set.annotate(distance=F("end_date") - query_date)

# Generate a dataframe from this queryset, and order by distance
df = pd.DataFrame.from_records(query.values("distance", *fields), coerce_float=True)
df = df.sort_values("distance").reset_index(drop=True)

Then, I try calling df.distance.iloc[[0, -1]] = [1, 2]. This raises the warning.

Upvotes: 1

Views: 3875

Answers (2)

root
root

Reputation: 33793

The issue isn't with iloc, it's when you access .mycol that a copy is created. You can do this all within iloc:

df.iloc[[0, -1], df.columns.get_loc('mycol')] = [1, 2]

Usually ix is used if you want mixed integer and label based access, but doesn't work in this case since -1 isn't actually in the index, and apparently ix isn't smart enough to know it should be the last index.

Upvotes: 1

EdChum
EdChum

Reputation: 394013

What you're doing is called chained indexing, you can use iloc just on that column to avoid the warning:

In [24]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))

Out[24]:
          a         b         c
0  1.589940  0.735713 -1.158907
1  0.485653  0.044611  0.070907
2  1.123221 -0.862393 -0.807051
3  0.338653 -0.734169 -0.070471
4  0.344794  1.095861 -1.300339

In [25]:
df['a'].iloc[[0,-1]] ='foo'
df

Out[25]:
          a         b         c
0       foo  0.735713 -1.158907
1  0.485653  0.044611  0.070907
2   1.12322 -0.862393 -0.807051
3  0.338653 -0.734169 -0.070471
4       foo  1.095861 -1.300339

If you do it the other way then it raises the warning:

In [27]:
df.iloc[[0,-1]]['a'] ='foo'

C:\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\IPython\kernel\__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

Upvotes: 1

Related Questions