Reputation: 910
I have 2 dataframes, one of them contains some general information about football players, and second of them contains other information like winning matches for each player. They both have the "id" column. However, they are not in same length.
What I want to do is creating a new dataframe which contains 2 columns: "x" from first dataframe and "y" from second dataframe, ONLY where the "id" column contains the same value in both dataframes. Thus, I can match the "x" and "y" columns which belong to same person.
I tried to do it using concat function:
pd.concat([firstdataframe['x'], seconddataframe['y']], axis=1, keys=['x', 'y'])
But I didn't manage to know how to apply the condition of the "id" being equal in both dataframes.
Upvotes: 2
Views: 1957
Reputation: 862521
It seems you need merge
with default inner
join, also each values in id
columns has to be unique:
df = pd.merge(df1[['id','x']], df2[['id','y']], on='id')
Sample:
df1 = pd.DataFrame({'id':[1,2,3],'x':[4,3,8]})
print (df1)
id x
0 1 4
1 2 3
2 3 8
df2 = pd.DataFrame({'id':[1,2],'y':[7,0]})
print (df2)
id y
0 1 7
1 2 0
df = pd.merge(df1[['id','x']], df2[['id','y']], on='id')
print (df)
id x y
0 1 4 7
1 2 3 0
Solution with concat
is possible, but a bit complicated, becasue need join on indexes with inner
join:
df = pd.concat([df1.set_index('id')['x'],
df2.set_index('id')['y']], axis=1, join='inner')
.reset_index()
print (df)
id x y
0 1 4 7
1 2 3 0
EDIT:
If id
s are not unique, duplicates create all combinations and output dataframe
is expanded:
df1 = pd.DataFrame({'id':[1,2,3],'x':[4,3,8]})
print (df1)
id x
0 1 4
1 2 3
2 3 8
df2 = pd.DataFrame({'id':[1,2,1,1],'y':[7,0,4,2]})
print (df2)
id y
0 1 7
1 2 0
2 1 4
3 1 2
df = pd.merge(df1[['id','x']], df2[['id','y']], on='id')
print (df)
id x y
0 1 4 7
1 1 4 4
2 1 4 2
3 2 3 0
Upvotes: 1