Vivek Anand
Vivek Anand

Reputation: 391

Pandas doing concat instead of merge on doing merge operation

I have two dataframes,df1:

                      name              mark
     0               Alex  [Tue, 0.0, 10, 0.0, 0.0]
     1               John  [Tue, 0.0, 10, 0.0, 0.0]
     2                Tom  [Tue, 0.0, 10, 0.0, 0.0]
     3                Tim  [Tue, 0.0, 10, 0.0, 0.0]

and df2:

           name        mark1
   0       Alex  [11.0, 0.0, 1.0]
   1       John  [12.0, 0.0, 4.0]
   2        Tom  [12.0, 0.0, 4.0]

and when i do :

merged = pd.merge(df1,df2,how='outer',on='name').fillna(0)

i expected to get something like this:

                 name              mark                   mark1
     0           Alex  [Tue, 0.0, 10, 0.0, 0.0]   [11.0, 0.0, 1.0]
     1           John  [Tue, 0.0, 10, 0.0, 0.0]   [12.0, 0.0, 4.0]
     2            Tom  [Tue, 0.0, 10, 0.0, 0.0]   [12.0, 0.0, 4.0]
     3            Tim  [Tue, 0.0, 10, 0.0, 0.0]        0

but i got something like this(which looks more like concat):

                 name              mark              mark1
     0           Alex  [Tue, 0.0, 10, 0.0, 0.0]       0
     1           John  [Tue, 0.0, 10, 0.0, 0.0]       0
     2            Tom  [Tue, 0.0, 10, 0.0, 0.0]       0
     3            Tim  [Tue, 0.0, 10, 0.0, 0.0]       0
     4           Alex          0                [11.0, 0.0, 1.0]
     5           John          0                [12.0, 0.0, 4.0]
     6            Tom           0               [12.0, 0.0, 4.0]

can someone please tell me what wrong am i doing? so this is my whole code :

                       name              mark
     0                Alex   [Mon, 10.12, 12, 10.0, 17.0]
     1                Alex   [Wed, 10.12, 15, 10.0, 17.0]
     2                Alex   [Fri, 10.12, 7, 10.0, 17.0]
     3                Alex   [Tue, 0.0, 10, 0.0, 0.0]
     4                Alex   [Thu, 0.0, 16, 0.0, 0.0]
     5                Alex   [Sat, 0.0, 2, 0.0, 0.0]
     6                Alex   [Sun, 0.0, 12, 0.0, 0.0]
     7                John   [Fri, 10.12, 7, 10.0, 17.0]
     8                John   [Mon, 10.12, 12, 10.0, 17.0]
     9                John   [Tue, 0.0, 10, 0.0, 0.0]
    10               John   [Wed, 0.0, 15, 0.0, 0.0]
    11               John   [Thu, 0.0, 16, 0.0, 0.0]
    12               John   [Sat, 0.0, 2, 0.0, 0.0]
    13               John   [Sun, 0.0, 12, 0.0, 0.0]
    14                Tom  [Wed, 10.12, 15, 10.0, 17.0]
    15                Tom  [Mon, 10.12, 12, 10.0, 17.0]
    16                Tom   [Fri, 10.12, 7, 10.0, 17.0]
    17                Tom   [Tue, 0.0, 10, 0.0, 0.0]
    18                Tom   [Thu, 0.0, 16, 0.0, 0.0]
    19                Tom   [Sat, 0.0, 2, 0.0, 0.0]
    20                Tom   [Sun, 0.0, 12, 0.0, 0.0]
    21                Tim  [Mon, 10.12, 12, 10.0, 17.0]
    22                Tim  [Fri, 10.12, 7, 10.0, 17.0]
    23                Tim   [Tue, 0.0, 10, 0.0, 0.0]
    24                Tim   [Wed, 0.0, 15, 0.0, 0.0]
    25                Tim   [Thu, 0.0, 16, 0.0, 0.0]
    26                Tim   [Sat, 0.0, 2, 0.0, 0.0]
    27                Tim   [Sun, 0.0, 12, 0.0, 0.0]

and then i do :

df = (df.groupby(['name'])['mark'].apply(list).apply(lambda x: [list(y) for y 
                 in set([tuple(j) for j in x])]).reset_index())

which gives me :

                  name                                               mark
 0               Alex  [[Tue, 0.0, 10, 0.0, 0.0], [Sun, 0.0, 12, 0.0,...
 1               John  [[Tue, 0.0, 10, 0.0, 0.0], [Sun, 0.0, 12, 0.0,...
 2                Tom  [[Tue, 0.0, 10, 0.0, 0.0], [Sun, 0.0, 12, 0.0,...
 3                Tim  [[Tue, 0.0, 10, 0.0, 0.0], [Sun, 0.0, 12, 0.0,...

second dataframe is also obtained by doing something like this.(sorry for not placing the exact dataframe cuz it was kind of messy)

Upvotes: 1

Views: 36

Answers (2)

wwnde
wwnde

Reputation: 26676

Merge left, on column 'name'

df1.merge(df2, how='left',on='name')

  name                     mark               mark1
0  Alex  [Tue, 0.0, 10, 0.0, 0.0]   [11.0, 0.0, 1.0]
1  John  [Tue, 0.0, 10, 0.0, 0.0]   [12.0, 0.0, 4.0]
2   Tom  [Tue, 0.0, 10, 0.0, 0.0]   [12.0, 0.0, 4.0]
3   Tim  [Tue, 0.0, 10, 0.0, 0.0]                NaN

Upvotes: 1

ldren
ldren

Reputation: 159

The following should work:

merged = df1.merge(df2, how='left',on='name').fillna(0)

This is because in your merge you are performing a full outer join.

Upvotes: 1

Related Questions