Reputation: 2365
I have two dataframes like-
identity time Date matched_time
0 197_$ 21:21:21 9/11/2015 21:21:30
0 197_$ 21:21:51 9/11/2015 21:22:00
0 197_$ 21:22:21 9/11/2015 21:22:30
0 197_$ 21:22:51 9/11/2015 21:23:00
0 197_$ 21:23:21 9/11/2015 21:23:30
0 197_$ 21:23:51 9/11/2015 21:24:00
identity Line Epoch Day Seconds Date Time
197_$ 9344 11203 4 280290 9/11/2015 1/1/1900 21:21
197_$ 9345 11204 4 280320 9/11/2015 1/1/1900 21:22
197_$ 9346 11205 4 280350 9/11/2015 1/1/1900 21:22
197_$ 9347 11206 4 280380 9/11/2015 1/1/1900 21:23
197_$ 9348 11207 4 280410 9/11/2015 1/1/1900 21:23
197_$ 9349 11208 4 280440 9/11/2015 1/1/1900 21:24
Now I want to merge the columns to create a new dataframe- I did-
df2=pd.merge(df,out,how='outer')
but the desired output was not obtained.I just wanted to create a dataframe which has all the columns.
so the dataframe should look like this-
identity time Date matched_time Line Epoch ....
0 197_$ 21:21:21 9/11/2015 21:21:30 9344 11203 ....
0 197_$ 21:21:51 9/11/2015 21:22:00 9345 11204
0 197_$ 21:22:21 9/11/2015 21:22:30 9346 11205
0 197_$ 21:22:51 9/11/2015 21:23:00 9347 11206
0 197_$ 21:23:21 9/11/2015 21:23:30 9348 11207
0 197_$ 21:23:51 9/11/2015 21:24:00 9349 11208
Upvotes: 0
Views: 704
Reputation: 2152
In general you should not use merge()
unless you have unique keys in at least one side (left or right). Instead, use concat()
if you have identical columns in both dataframes. I omitted a column Time
of your 2nd dataframe for simplicity.
df1
:
identity time Date matched_time
0 197_$ 21:21:21 9/11/2015 21:21:30
1 197_$ 21:21:51 9/11/2015 21:22:00
2 197_$ 21:22:21 9/11/2015 21:22:30
3 197_$ 21:22:51 9/11/2015 21:23:00
4 197_$ 21:23:21 9/11/2015 21:23:30
5 197_$ 21:23:51 9/11/2015 21:24:00
df2
:
identityLine Epoch Day Seconds Date
0 197_$ 9344 11203 4 280290 9/11/2015
1 197_$ 9345 11204 4 280320 9/11/2015
2 197_$ 9346 11205 4 280350 9/11/2015
3 197_$ 9347 11206 4 280380 9/11/2015
4 197_$ 9348 11207 4 280410 9/11/2015
5 197_$ 9349 11208 4 280440 9/11/2015
Combine 2 dataframes using concat()
:
df3 = (pd.concat([df1.set_index(['identity', 'Date']),
df2.set_index(['identity', 'Date'])],
axis=1).reset_index(drop=False))
Output(df3
):
identity Date time matched_time Line Epoch Day Seconds
0 197_$ 9/11/2015 21:21:21 21:21:30 9344 11203 4 280290
1 197_$ 9/11/2015 21:21:51 21:22:00 9345 11204 4 280320
2 197_$ 9/11/2015 21:22:21 21:22:30 9346 11205 4 280350
3 197_$ 9/11/2015 21:22:51 21:23:00 9347 11206 4 280380
4 197_$ 9/11/2015 21:23:21 21:23:30 9348 11207 4 280410
5 197_$ 9/11/2015 21:23:51 21:24:00 9349 11208 4 280440
Hope this helps..
Upvotes: 1