Parapapuppolo
Parapapuppolo

Reputation: 41

Pandas DataFrame column (Series) has different index than the Dataframe?

Consider this small script:

import pandas as pd

df = pd.DataFrame({'a': [1,2,3]})
b = df.a
b.index = b.index + 1
df['b'] = b
print(df)
print(df.a - df.b)

the output is:

   a    b
0  1  NaN
1  2  1.0
2  3  2.0

0    NaN
1    0.0
2    0.0
3    NaN

while I was expecting df.a - df.b to be

0    NaN
1    1.0
2    1.0

How is this possible? Is it a Pandas bug?

Upvotes: 4

Views: 1569

Answers (3)

Pythonic2020
Pythonic2020

Reputation: 186

Use this code to get what you expect:

aa = pd.DataFrame({'a': [1,2,3]})
bb = aa.a.copy()
bb.index = bb.index + 1
aa['b'] = bb
print(aa)
print(aa.a - aa.b)

Upvotes: 1

Pascal G. Bernard
Pascal G. Bernard

Reputation: 279

When you do aa.b - aa.a , you're substracting 2 pandas.Series having a same lenght, but not the same index :

aa.a

1    1
2    2
3    3
Name: a, dtype: int64

Where as:

aa.b

0    NaN
1    1.0
2    2.0
Name: b, dtype: float64

And when you do :

print(aa.b - aa.a)

you're printing the merge of these 2 pandas.Series (regardless the operation type : addition or substraction), and that's why the indices [0,1,2] and [1,2,3] will merged to a new index from 0 to 3 : [0,1,2,3].

And for instance, if you shift of 2 your bb.index instead of 1:

bb.index = bb.index + 2

that time, you will have 5 rows in your new pandas.Series instead of 4. And so on..

bb.index = bb.index + 2
aa['b'] = bb
print(aa.a - aa.b)

0    NaN
1    NaN
2    0.0
3    NaN
4    NaN
dtype: float64

Upvotes: 1

User
User

Reputation: 826

aa = pd.DataFrame({'a': [1,2,3]})
bb = aa.a
bb.index = bb.index + 1
aa['b'] = bb
aa.reset_index(drop=True)  # add this

your index does not match.

Upvotes: 2

Related Questions