Reputation: 881
I want to merge two data frame.
list 1:
id
0 1234
1 456
2 789
3 101112
list 2:
id age
0 456 18
1 101112 52
Desired result:
id age
0 1234 Na
1 456 18
2 789 Na
3 101112 52
tried:
list1.join(list2,list1.id==list2.id,'inner')
error:
ValueError: Can only compare identically-labeled Series objects
tried:
list1.merge(list2)
error:
result is list 2.
what is the problem?
Upvotes: 0
Views: 5920
Reputation: 41957
If you have two dataframe
s as
list1 = sqlContext.createDataFrame([(1234,), (456,), (789,), (101112,)], ["id"])
list1.show()
+------+
| id|
+------+
| 1234|
| 456|
| 789|
|101112|
+------+
and
list2 = sqlContext.createDataFrame([(456,18), (101112, 52)], ["id", "age"])
list2.show()
+------+---+
| id|age|
+------+---+
| 456| 18|
|101112| 52|
+------+---+
A left join
should give you your desired result
list1.join(list2, "id", "left").show()
+------+----+
| id| age|
+------+----+
| 789|null|
| 1234|null|
|101112| 52|
| 456| 18|
+------+----+
Upvotes: 2
Reputation: 4207
You need to perform a left_outer
join so it doesn't drop records
list3 = list1.join(list2, 'id', 'left_outer')
list3.show()
+------+----+
| id| age|
+------+----+
| 789|null|
| 1234|null|
|101112| 52|
| 456| 18|
+------+----+
Upvotes: 3