dongxu mo
dongxu mo

Reputation: 23

how to match two dataFrame in python

I have two dataFrame in Python. The first one is df1:

'ID'    'B' 
   AA    10
   BB    20
   CC    30
   DD    40

The second one is df2:

 'ID'  'C'  'D'
    BB   30   0
    DD   35   0

What I want to get finally is like df3:

'ID'  'C'  'D'
   BB   30   20
   DD   35   40

how to reach this goal? my code is:

for i in df.ID
  if len(df2.ID[df2.ID==i]):
    df2.D[df2.ID==i]=df1.B[df2.ID==i]

but it doesn't work.

Upvotes: 2

Views: 14233

Answers (2)

Work of Artiz
Work of Artiz

Reputation: 1090

So first of all, I've interpreted the question differently, since your description is rather ambiguous. Mine boils down to this:

df1 is this data structure:

ID   B            <- column names
AA  10
BB  20
CC  30
DD  40

df2 is this data structure:

ID   C  D        <- column names
BB  30  0
DD  35  0

Dataframes have a merge option, if you wanted to merge based on index the following code would work:

import pandas as pd

df1 = pd.DataFrame(
    [
        ['AA', 10],
        ['BB', 20],
        ['CC', 30],
        ['DD', 40],
    ],
    columns=['ID','B'],
)
df2 = pd.DataFrame(
    [
        ['BB', 30, 0],
        ['DD', 35, 0],
    ], columns=['ID', 'C', 'D']
)

df3 = pd.merge(df1, df2, on='ID')

Now df3 only contains rows with ID's in both df1 and df2:

ID   B   C  D    <- column names
BB  20  30  0
DD  40  35  0

Now you were trying to remove D, and fill it in with column B, a.k.a

ID  C  D
BB 30 20 
DD 35 40

Something that can be done with these simple steps:

df3 = pd.merge(df1, df2, on='ID') # merge them
df3.D = df3['B']                  # set D to B's values
del df3['B']                      # remove B from df3

Or to summarize:

def match(df1, df2):
    df3 = pd.merge(df1, df2, on='ID') # merge them
    df3.D = df3['B']                  # set D to B's values
    del df3['B']                      # remove B from df3
    return df3

Upvotes: 3

Shijo
Shijo

Reputation: 9721

Following code will replace zero in df1 with value df2

df1=pd.DataFrame(['A','B',0,4,6],columns=['x'])
df2=pd.DataFrame(['A','X',3,0,5],columns=['x'])
df3=df1[df1!=0].fillna(df2)

Upvotes: 0

Related Questions