Demaunt
Demaunt

Reputation: 1243

Python adding column to dataframe causes NaN

I have a series and df

s = pd.Series([1,2,3,5])
df = pd.DataFrame()

When I add columns to df like this

df.loc[:, "0-2"] = s.iloc[0:3]
df.loc[:, "1-3"] = s.iloc[1:4]

I get df

   0-2  1-3
0    1  NaN
1    2  2.0
2    3  3.0

Why am I getting NaN? I tried create new series with correct idxs, but adding it to df still causes NaN.

What I want is

   0-2  1-3
0    1  2
1    2  3
2    3  5

Upvotes: 3

Views: 4395

Answers (1)

3novak
3novak

Reputation: 2544

Try either of the following lines.

df.loc[:, "1-3"] = s.iloc[1:4].values
# -OR-
df.loc[:, "1-3"] = s.iloc[1:4].reset_index(drop=True)

Your original code is trying unsuccessfully to match the index of the data frame df to the index of the subset series s.iloc[1:4]. When it can't find the 0 index in the series, it places a NaN value in df at that location. You can get around this by only keeping the values so it doesn't try to match on the index or resetting the index on the subset series.

>>> s.iloc[1:4]
1    2
2    3
3    5
dtype: int64

Notice the index values since the original, unsubset series is the following.

>>> s
0    1
1    2
2    3
3    5
dtype: int64

The index of the first row in df is 0. By dropping the indices with the values call, you bypass the index matching which is producing the NaN. By resetting the index in the second option, you make the indices the same.

Upvotes: 4

Related Questions