user2890059
user2890059

Reputation: 145

Pandas - How to merge two DataFrames

I reword my question. I'm searching solution for the following problem:

I have a DataFrame like:

 Sp   Mt   Value  count
4  MM2  S4   bg     10
5  MM2  S4   dgd    1
6  MM4  S2   rd     2
7  MM4  S2   cb     8
8  MM4  S2   uyi    8

Another DataFrame like:

  Sp   Mt   Value  count
4  MM2  S4   bg     9
5  MM2  S4   dgd    10
6  MM4  S2   rd     21
7  MM4  S2   cb     8
8  MM4  S2   uyi    8

I want to merge both like :

 Sp   Mt   Value  count
4  MM2  S4   bg    [10,9]
5  MM2  S4   dgd   [1,10]
6  MM4  S2   rd    [2,21]
7  MM4  S2   cb    [8,8]
8  MM4  S2   uyi   [8,8]

Upvotes: 2

Views: 163

Answers (3)

piRSquared
piRSquared

Reputation: 294516

Using pd.Series.add on list

icol = ['Sp', 'Mt', 'Value']
d1 = df1.set_index(icol)['count']
d2 = df2.set_index(icol)['count']

(d1.apply(lambda x: [x]) +  d2.apply(lambda x: [x])).reset_index()

    Sp  Mt Value    count
0  MM2  S4    bg  [10, 9]
1  MM2  S4   dgd  [1, 10]
2  MM4  S2    rd  [2, 21]
3  MM4  S2    cb   [8, 8]
4  MM4  S2   uyi   [8, 8]

Upvotes: 0

Soner
Soner

Reputation: 126

frames=[df1,df2]
df3=pd.concat(frames)

df_merged_agg=df3.groupby(['Sp','Mt','Value']).agg({'count': lambda x: ','.join(x)}).reset_index()

The Result would be:

 Sp  Mt Value count
0  MM2  S4    bg  10,9
1  MM2  S4   dgb  1,10
2  MM4  S2    cb   8,8
3  MM4  S2    rd  2,21
4  MM4  S2   uyi   8,8

To get tuple agg field:

df_merged_agg=df3.groupby(['Sp','Mt','Value']).agg(lambda x: tuple(x)).reset_index()

Upvotes: 0

jezrael
jezrael

Reputation: 863611

Use merge (default inner join) or concat (default outer join) for join both to one first. Then create new column and last remove unnecessary columns by drop:

df = pd.merge(df1, df2, on = ['Sp','Mt','Value'])
df['count'] = df.filter(like='count').values.tolist()
df = df.drop(['count_x','count_y'], axis=1)
print (df)
    Sp  Mt Value    count
0  MM2  S4    bg  [10, 9]
1  MM2  S4   dgd  [1, 10]
2  MM4  S2    rd  [2, 21]
3  MM4  S2    cb   [8, 8]
4  MM4  S2   uyi   [8, 8]

cols = ['Sp','Mt','Value']
df = pd.concat([df1.set_index(cols)['count'], 
                df2.set_index(cols)['count']],axis=1, keys=('count','a'))
df['count'] = df[['count','a']].values.tolist()
df = df.drop('a', axis=1).reset_index()
print (df)
    Sp  Mt Value    count
0  MM2  S4    bg  [10, 9]
1  MM2  S4   dgd  [1, 10]
2  MM4  S2    rd  [2, 21]
3  MM4  S2    cb   [8, 8]
4  MM4  S2   uyi   [8, 8]

Upvotes: 2

Related Questions