Set first and last row of a column in a dataframe

Question

I've been reading over this and still find the subject a little confusing : http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Say I have a Pandas DataFrame and I wish to simultaneously set the first and last row elements of a single column to whatever value. I can do this :

df.iloc[[0, -1]].mycol = [1, 2]

which tells me A value is trying to be set on a copy of a slice from a DataFrame. and that this is potentially dangerous.

I could use .loc instead, but then I need to know the index of the first and last rows ( in constrast, .iloc allows me to access by location ).

What's the safest Pandasy way to do this ?

To get to this point :

# Django queryset
query = market.stats_set.annotate(distance=F("end_date") - query_date)

# Generate a dataframe from this queryset, and order by distance
df = pd.DataFrame.from_records(query.values("distance", *fields), coerce_float=True)
df = df.sort_values("distance").reset_index(drop=True)

Then, I try calling df.distance.iloc[[0, -1]] = [1, 2]. This raises the warning.

root · Accepted Answer

The issue isn't with iloc, it's when you access .mycol that a copy is created. You can do this all within iloc:

df.iloc[[0, -1], df.columns.get_loc('mycol')] = [1, 2]

Usually ix is used if you want mixed integer and label based access, but doesn't work in this case since -1 isn't actually in the index, and apparently ix isn't smart enough to know it should be the last index.

Set first and last row of a column in a dataframe

Answers (2)

Related Questions