sssbbbaaa
sssbbbaaa

Reputation: 244

Map a Pandas Series with duplicate keys to a DataFrame

Env: Python 3.9.6, Pandas 1.3.5


I have a DataFrame and a Series like below

df = pd.DataFrame({"C1" : ["A", "B", "C", "D"]})
sr = pd.Series(data  = [1, 2, 3, 4, 5],
               index = ["A", "A", "B", "C", "D"])
"""
[DataFrame]
   C1
0  A
1  B
2  C
3  D

[Series]
A    1
A    2
B    3
C    4
D    5
"""

What I tried,

df["C2"] = df["C1"].map(sr)

But InvalidIndexError occurred because the series has duplicate keys ("A"). pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Is there any method to make DF like below?

   C1 C2
0  A  1
1  A  2
2  B  3
3  C  4
4  D  5

or

   C1 C2
0  A  1    
1  B  3
2  C  4
3  D  5
4  A  2

Row indices do not matter.

Upvotes: 0

Views: 421

Answers (1)

mozway
mozway

Reputation: 262559

The question was heavily edited and now has a very different meaning.

You want a simple merge:

df.merge(sr.rename('C2'),
         left_on='C1', right_index=True)

Output:

  C1  C2
0  A   1
0  A   2
1  B   3
2  C   4
3  D   5
old answer

First, I don't reproduce your issue (tested with 3M rows on pandas 1.3.5).

Then why do you use slicing and not map? This would have the advantage of systematically outputting the correct number of rows (NaN if the key is absent):

Example:

sr = pd.Series({10:"A", 13:"B", 16:"C", 18:"D"})
df = pd.DataFrame({"C1":np.random.randint(10, 20, size=3000000)})
df['C2'] = df['C1'].map(sr)
print(df.head())

output:

   C1   C2
0  10    A
1  18    D
2  10    A
3  13    B
4  15  NaN

Upvotes: 2

Related Questions