Reputation: 105
For my thesis, I am trying to use several variables from two types of surveys (the British Election Studies (BES) and the British Social Attitudes Survey (BSA)) and combine them into one dataset. Currently, I have two datasets, one with BES data, which looks like this (in simplified version):
| year | class | education | gender | age |
| ---- | ----- | --------- | ------ | --- |
| 1992 | working | A-levels | female | 32 |
| 1992 | middle | GCSE | male | 49 |
| 1997 | lower | Undergrad | female | 24 |
| 1997 | middle | GCSE | male | 29 |
The BSA data looks like this (again, simplified):
| year | class | education | gender | age |
| ---- | ----- | --------- | ------ | --- |
| 1992 | middle | A-levels | male | 22 |
| 1993 | working | GCSE | female | 45 |
| 1994 | upper | Postgrad | female | 38 |
| 1994 | middle | GCSE | male | 59 |
Basically, what I am trying to do is combine the two into one dataframe that looks like this:
| year | class | education | gender | age |
| ---- | ----- | --------- | ------ | --- |
| 1992 | working | A-levels | female | 32 |
| 1992 | middle | GCSE | male | 49 |
| 1992 | middle | A-levels | male | 22 |
| 1993 | working | GCSE | female | 45 |
| 1994 | upper | Postgrad | female | 38 |
| 1994 | middle | GCSE | male | 59 |
| 1997 | lower | Undergrad | female | 24 |
| 1997 | middle | GCSE | male | 29 |
I have googled a lot about joins and merging, but I can't figure it out in a way that works correctly. From what I understand, I believe I should join "by" the year variable, but is that correct? And how can I prevent it taking up a lot of memory to perform the computation (the actual datasets are about 30k for the BES and 130k for the BSA)? Is there a solution using either dplyr or data.tables in R?
Any help is much appreciated!!!
Upvotes: 0
Views: 39
Reputation: 160792
This is not a "merge" (or join) operation, it's just row-concatenation. In R, that's done with rbind
(which works for matrix
and data.frame
using different methods). (For perspective, there's also cbind
, which concatenates by columns. Not applicable here.)
rbind(BES, BSA)
# year class education gender age
# 1 1992 working A-levels female 32
# 2 1992 middle GCSE male 49
# 3 1997 lower Undergrad female 24
# 4 1997 middle GCSE male 29
# 5 1992 middle A-levels male 22
# 6 1993 working GCSE female 45
# 7 1994 upper Postgrad female 38
# 8 1994 middle GCSE male 59
dplyr::bind_rows(BES, BSA)
data.table::rbindlist(list(BES, BSA))
Upvotes: 2