Merging two pandas data frames

Question

I have one dataframe, A, that looks like this:

+---+------------+-----------+-----------+-----+-------+
|   |    time    |    uid    |   o_uid   | msg | count |
+---+------------+-----------+-----------+-----+-------+
| 0 | 1433131357 | 191470529 | 191159572 | eis |     1 |
| 1 | 1433131410 | 191458009 | 160429326 | eis |     1 |
| 2 | 1433131504 | 191470523 | 153734142 | eis |     1 |
| 3 | 1433131685 | 191470551 | 191470546 | eis |     1 |
| 4 | 1433131782 | 191470565 | 187367195 | eis |     1 |
+---+------------+-----------+-----------+-----+-------+

And another dataframe, B, that looks like this:

+---+------------+-----------+-------+
|   |    time    |    uid    | count |
+---+------------+-----------+-------+
| 0 | 1433131967 | 191470529 |     1 |
| 1 | 1433132503 | 191466638 |     1 |
| 2 | 1433139333 | 191451858 |     1 |
| 3 | 1433141249 | 191470551 |     1 |
| 4 | 1433143867 | 191471209 |     1 |
+---+------------+-----------+-------+

What I would like to do is take all that timestamps from B and place them in a column in A where the UIDs match. Where there is no match, there should be a NaN

I tried this: df = pd.merge(A, B, left_on='uid', right_on='uid', how='outer')

But I think it only appended B to the bottom of A. It did not work as expected.

Alexander · Accepted Answer

Both your time and count columns overlap in both DataFrames, so you need to supply suffixes as a parameter. In the example below, I use a null suffix for df_a and '_b' as the suffix for df_b.

import pandas as pd

df_a = pd.DataFrame({'count': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
                     'msg': {0: 'eis', 1: 'eis', 2: 'eis', 3: 'eis', 4: 'eis'},
                     'o_uid': {0: 191159572, 1: 160429326, 2: 153734142, 3: 191470546, 4: 187367195},
                     'time': {0: 1433131357, 1: 1433131410, 2: 1433131504, 3: 1433131685, 4: 1433131782},
                     'uid': {0: 191470529, 1: 191458009, 2: 191470523, 3: 191470551, 4: 191470565}})

df_b = pd.DataFrame({'count': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
                     'time': {0: 1433131967, 1: 1433132503, 2: 1433139333, 3: 1433141249, 4: 1433143867},
                     'uid': {0: 191470529, 1: 191466638, 2: 191451858, 3: 191470551, 4: 191471209}})

>>> df_a.merge(df_b, how='outer', on='uid', suffixes=['', '_b'])
   count  msg      o_uid        time        uid  count_b      time_b
0      1  eis  191159572  1433131357  191470529        1  1433131967
1      1  eis  160429326  1433131410  191458009      NaN         NaN
2      1  eis  153734142  1433131504  191470523      NaN         NaN
3      1  eis  191470546  1433131685  191470551        1  1433141249
4      1  eis  187367195  1433131782  191470565      NaN         NaN
5    NaN  NaN        NaN         NaN  191466638        1  1433132503
6    NaN  NaN        NaN         NaN  191451858        1  1433139333
7    NaN  NaN        NaN         NaN  191471209        1  1433143867

Merging two pandas data frames

Answers (2)

Related Questions