ubuntu_noob
ubuntu_noob

Reputation: 2365

getting wrong result while merging pandas dataframe

I have two dataframes like-

    identity time       Date    matched_time
0   197_$   21:21:21    9/11/2015   21:21:30
0   197_$   21:21:51    9/11/2015   21:22:00
0   197_$   21:22:21    9/11/2015   21:22:30
0   197_$   21:22:51    9/11/2015   21:23:00
0   197_$   21:23:21    9/11/2015   21:23:30
0   197_$   21:23:51    9/11/2015   21:24:00



 identity Line  Epoch   Day Seconds   Date        Time  
    197_$    9344   11203   4   280290  9/11/2015   1/1/1900 21:21  
    197_$    9345   11204   4   280320  9/11/2015   1/1/1900 21:22  
    197_$    9346   11205   4   280350  9/11/2015   1/1/1900 21:22  
    197_$    9347   11206   4   280380  9/11/2015   1/1/1900 21:23  
    197_$    9348   11207   4   280410  9/11/2015   1/1/1900 21:23  
    197_$    9349   11208   4   280440  9/11/2015   1/1/1900 21:24  

Now I want to merge the columns to create a new dataframe- I did-

df2=pd.merge(df,out,how='outer')

but the desired output was not obtained.I just wanted to create a dataframe which has all the columns.

so the dataframe should look like this-

identity time       Date    matched_time      Line Epoch  ....
0   197_$  21:21:21    9/11/2015   21:21:30  9344  11203  ....
0   197_$  21:21:51    9/11/2015   21:22:00  9345  11204
0   197_$  21:22:21    9/11/2015   21:22:30  9346  11205
0   197_$  21:22:51    9/11/2015   21:23:00  9347  11206
0   197_$  21:23:21    9/11/2015   21:23:30  9348  11207
0   197_$  21:23:51    9/11/2015   21:24:00  9349  11208

Upvotes: 0

Views: 704

Answers (1)

gyoza
gyoza

Reputation: 2152

In general you should not use merge() unless you have unique keys in at least one side (left or right). Instead, use concat() if you have identical columns in both dataframes. I omitted a column Time of your 2nd dataframe for simplicity.

df1:

    identity    time    Date    matched_time
0   197_$   21:21:21    9/11/2015   21:21:30
1   197_$   21:21:51    9/11/2015   21:22:00
2   197_$   21:22:21    9/11/2015   21:22:30
3   197_$   21:22:51    9/11/2015   21:23:00
4   197_$   21:23:21    9/11/2015   21:23:30
5   197_$   21:23:51    9/11/2015   21:24:00

df2:

    identityLine    Epoch   Day Seconds Date
0   197_$   9344    11203   4   280290  9/11/2015
1   197_$   9345    11204   4   280320  9/11/2015
2   197_$   9346    11205   4   280350  9/11/2015
3   197_$   9347    11206   4   280380  9/11/2015
4   197_$   9348    11207   4   280410  9/11/2015
5   197_$   9349    11208   4   280440  9/11/2015

Combine 2 dataframes using concat():

df3 = (pd.concat([df1.set_index(['identity', 'Date']), 
                  df2.set_index(['identity', 'Date'])], 
                  axis=1).reset_index(drop=False))

Output(df3):

    identity    Date    time    matched_time    Line    Epoch   Day Seconds
0   197_$   9/11/2015   21:21:21    21:21:30    9344    11203   4   280290
1   197_$   9/11/2015   21:21:51    21:22:00    9345    11204   4   280320
2   197_$   9/11/2015   21:22:21    21:22:30    9346    11205   4   280350
3   197_$   9/11/2015   21:22:51    21:23:00    9347    11206   4   280380
4   197_$   9/11/2015   21:23:21    21:23:30    9348    11207   4   280410
5   197_$   9/11/2015   21:23:51    21:24:00    9349    11208   4   280440

Hope this helps..

Upvotes: 1

Related Questions