Reputation:
I am looking for fastest way to join columns with same names using separator. my dataframes:
df1:
A,B,C,D
my,he,she,it
df2:
A,B,C,D
dog,cat,elephant,fish
expected output:
df:
A,B,C,D
my:dog,he:cat,she:elephant,it:fish
As you can see, I want to merge columns with same names, two cells in one.
I can use this code for A
column:
df=df1.merge(df2)
df['A'] = df[['A_x','A_y']].apply(lambda x: ':'.join(x), axis = 1)
In my real dataset i have above 30 columns, and i dont want to write same lines for each of them, is there any faster way to receive my expected output?
Upvotes: 3
Views: 253
Reputation: 493
You can simply do:
df = df1 + ':' + df2
print(df)
Which is simple and effective
This should be your answer
Upvotes: 0
Reputation: 23099
How about concat
and groupby
?
df3 = pd.concat([df1,df2],axis=0)
df3 = df3.groupby(df3.index).transform(lambda x : ':'.join(x)).drop_duplicates()
print(df3)
A B C D
0 my:dog he:cat she:elephant it:fish
Upvotes: 2
Reputation: 3575
How about this?
df3 = df1 + ':' + df2
print(df3)
A B C D
0 my:dog he:cat she:elephant it:fish
This is good because if there's columns that doesn't match, you get NaN
, so you can filter then later if you want:
df1 = pd.DataFrame({'A': ['my'], 'B': ['he'], 'C': ['she'], 'D': ['it'], 'E': ['another'], 'F': ['and another']})
df2 = pd.DataFrame({'A': ['dog'], 'B': ['cat'], 'C': ['elephant'], 'D': ['fish']})
df1 + ':' + df2
A B C D E F
0 my:dog he:cat she:elephant it:fish NaN NaN
Upvotes: 2
Reputation: 953
you can do this by simply adding the two dataframe with a separator.
import pandas as pd
df1 = pd.DataFrame(columns=["A", "B", "C", "D"], index=[0])
df2 = pd.DataFrame(columns=["A", "B", "C", "D"], index=[0])
df1["A"] = "my"
df1["B"] = "he"
df1["C"] = "she"
df1["D"] = "it"
df2["A"] = "dog"
df2["B"] = "cat"
df2["C"] = "elephant"
df2["D"] = "fish"
print(df1)
print(df2)
df3 = df1 + ':' + df2
print(df3)
This will give you a result like:
A B C D
0 my he she it
A B C D
0 dog cat elephant fish
A B C D
0 my:dog he:cat she:elephant it:fish
Is this what you try to achieve? Although, this only works if you have same columns in both the dataframes. The extra columns will have nans. What do you want to do with the columns those are not same in df1 and df2? Please comment below to help me understand your problem better.
Upvotes: 2