Reputation: 3812
Consider a trivial example with a Dataframe df
and a Series s
import pandas as pd
matching_vals = range(20,30)
df = pd.DataFrame(columns=['a'], index=range(0,10))
df['a'] = matching_vals
s = pd.Series(list("ABCDEFGHIJ"), index=matching_vals)
df['b'] = s[df['a']]
At this point I would expect df['b']
to contain the letters A
through J
, but instead it's all NaN
. However, if I replace the last line with
n = df['a'][2]
df['c'] = s[n]
then df['c']
is filled with C
s, as I'd expect, so I'm pretty sure it's not some strange type error.
I'm new to pandas, and this is driving me crazy.
Upvotes: 5
Views: 8305
Reputation: 879561
s[df['a']]
has an index which is different than df
's index:
In [104]: s[df['a']]
Out[104]:
a
20 A
21 B
22 C
23 D
24 E
25 F
26 G
27 H
28 I
29 J
When you assign a Series to a column of a DataFrame, Pandas tries to assign values according to the index. Since s[df['a']]
does not have any values associated with the indices of df
, NaN
values are assigned. The assignment does not add new rows to df
.
If you don't want the index to enter into the assignment, you could use
df['b'] = s[df['a']].values
For a demonstration of the matching of indices, notice how
import pandas as pd
df = pd.DataFrame(columns=['a'], index=range(0,10))
df['a'] = range(0,10)[::-1]
s = pd.Series(list("ABCDEFGHIJ"), index=range(0,10)[::-1])
df['b'] = s[df['a']]
yields
In [123]: s[df['a']]
Out[123]:
a
9 A
8 B
7 C
6 D
5 E
4 F
3 G
2 H
1 I
0 J
dtype: object
In [124]: df
Out[124]:
a b
0 9 J
1 8 I
2 7 H
3 6 G
4 5 F
5 4 E
6 3 D
7 2 C
8 1 B
9 0 A
[10 rows x 2 columns]
The values of df['b']
are "flipped" to make the indices match.
Upvotes: 7