user8560167
user8560167

Reputation:

Join two same columns from two dataframes, pandas

I am looking for fastest way to join columns with same names using separator. my dataframes:

df1:
A,B,C,D
my,he,she,it

df2:
A,B,C,D
dog,cat,elephant,fish

expected output:

df:
A,B,C,D
my:dog,he:cat,she:elephant,it:fish

As you can see, I want to merge columns with same names, two cells in one. I can use this code for A column:

df=df1.merge(df2)
df['A'] = df[['A_x','A_y']].apply(lambda x: ':'.join(x), axis = 1)

In my real dataset i have above 30 columns, and i dont want to write same lines for each of them, is there any faster way to receive my expected output?

Upvotes: 3

Views: 253

Answers (4)

VSharma
VSharma

Reputation: 493

You can simply do:

df = df1 + ':' + df2
print(df)

Which is simple and effective

This should be your answer

Upvotes: 0

Umar.H
Umar.H

Reputation: 23099

How about concat and groupby ?

df3 = pd.concat([df1,df2],axis=0)
df3 = df3.groupby(df3.index).transform(lambda x : ':'.join(x)).drop_duplicates()
print(df3)
         A       B             C        D
0  my:dog  he:cat  she:elephant  it:fish

Upvotes: 2

igorkf
igorkf

Reputation: 3575

How about this?

df3 = df1 + ':' + df2
print(df3)
       A       B         C             D 
0   my:dog  he:cat  she:elephant    it:fish

This is good because if there's columns that doesn't match, you get NaN, so you can filter then later if you want:

df1 = pd.DataFrame({'A': ['my'], 'B': ['he'], 'C': ['she'], 'D': ['it'], 'E': ['another'], 'F': ['and another']})
df2 = pd.DataFrame({'A': ['dog'], 'B': ['cat'], 'C': ['elephant'], 'D': ['fish']})
df1 + ':' + df2
       A       B          C             D    E   F
0   my:dog  he:cat  she:elephant    it:fish NaN NaN

Upvotes: 2

SSharma
SSharma

Reputation: 953

you can do this by simply adding the two dataframe with a separator.

import pandas as pd

df1 = pd.DataFrame(columns=["A", "B", "C", "D"], index=[0])
df2 = pd.DataFrame(columns=["A", "B", "C", "D"], index=[0])

df1["A"] = "my"
df1["B"] = "he"
df1["C"] = "she"
df1["D"] = "it"
df2["A"] = "dog"
df2["B"] = "cat"
df2["C"] = "elephant"
df2["D"] = "fish"

print(df1)
print(df2)

df3 = df1 + ':' + df2
print(df3)

This will give you a result like:

A   B    C   D
0  my  he  she  it
     A    B         C     D
0  dog  cat  elephant  fish
        A       B             C        D
0  my:dog  he:cat  she:elephant  it:fish

Is this what you try to achieve? Although, this only works if you have same columns in both the dataframes. The extra columns will have nans. What do you want to do with the columns those are not same in df1 and df2? Please comment below to help me understand your problem better.

Upvotes: 2

Related Questions