reservoirinvest
reservoirinvest

Reputation: 1777

Join two dataframes with duplicate values in key

Have two dataframes:

df1 =

    Col Date        Days
0   A   20180830    30
1   A   20180927    58
2   A   20181025    86
3   B   20180830    30
4   B   20180927    58
5   B   20181025    86
6   C   20180802    2
7   C   20180809    9
8   C   20180816    16
9   C   20180823    23

df2 =

    Col Lot     Pct
13  A   4000    16.19
184 B   600     7.51
206 C   250     5.00
...

How to make a single dataframe:

df =

    Col Date        Days    Lot     Pct
0   A   20180830    30      4000    16.19
1   A   20180927    58      4000    16.19
2   A   20181025    86      4000    16.19
3   B   20180830    30      600     7.51

df1.join(df2, on='Col') gives KeyError: 'Col' # Check for duplicates

Upvotes: 2

Views: 7622

Answers (1)

Abhay Padda
Abhay Padda

Reputation: 146

You can instead use pd.merge to join to data frames in Pandas.

Example below:

import pandas as pd

data_1 = {'Col': ['A', 'A', 'B', 'B', 'C', 'C'], 'Date': [20180830, 20180927, 20180830, 20180927, 20180809, 20180816], 'Days':[1,2,3,4,5,6]}

df1 = pd.DataFrame(data=data_1)

df1

df =

   Col  Date    Days
0   A   20180830    1
1   A   20180927    2
2   B   20180830    3
3   B   20180927    4
4   C   20180809    5
5   C   20180816    6

Create df2

data_2 = {'Col': ['A', 'B', 'C'], 'Lot': [4000, 600, 250], 'PCT': [16.19, 7.51, 5.00]}

df2 = pd.DataFrame(data=data_2)

df2 =

    Col Lot     PCT
0   A   4000    16.19
1   B   600      7.51
2   C   250      5.00

You can join the above two dataframes using:

pd.merge(df1, df2, on="Col")

The output will be:

    Col Date       Days Lot     PCT
0   A   20180830    1   4000    16.19
1   A   20180927    2   4000    16.19
2   B   20180830    3   600      7.51
3   B   20180927    4   600      7.51
4   C   20180809    5   250      5.00
5   C   20180816    6   250      5.00

Upvotes: 5

Related Questions