Reputation: 111
Given data frames the first column of which is the list of country names and is common in all data frames and the remainder columns are the years for which the value of the indicator is measured and these being the years is also common in data frames, what are the ways to merge the datasets by the first column? How to merge into a multidimensional array? dataset example:
country name | 2005 | .... | 2020 |
---|---|---|---|
Aruba | 23591 | ||
Angola | 1902 |
country name | 2005 | .... | 2020 |
---|---|---|---|
Aruba | -8.8 | ||
Angola | -3.5 |
Doing a full_join
library(dplyr)
full_join(DataSet1,DataSet2, by = 'country name')
changes the name of the columns and the data is not accessible.
Upvotes: 0
Views: 41
Reputation: 269556
1) Assuming the data frames in the Note at the end we can use bind_rows
library(dplyr)
bind_rows(DF1, DF2, .id = "id")
giving the following which takes all the rows from both data frames and identifies which data frame each row came from.
id countryName 2005 2006
1 1 Aruba 1 2
2 1 Angola 3 4
3 2 Aruba 11 12
4 2 Angola 13 14
2) Another possibility is to create a 3d array
library(abind)
a <- abind(DF1[-1], DF2[-1], along = 3, new.names = list(DF1$countryName,NULL,1:2))
a
giving this 3d array where the dimensions correspond to the country name, the year and the originating data.frame.
, , 1
2005 2006
Aruba 1 2
Angola 3 4
, , 2
2005 2006
Aruba 11 12
Angola 13 14
We can get various slices:
> a["Aruba",,]
1 2
2005 1 11
2006 2 12
> a[,"2005",]
1 2
Aruba 1 11
Angola 3 13
> a[,,2]
2005 2006
Aruba 11 12
Angola 13 14
DF1 <- structure(list(countryName = c("Aruba", "Angola"), `2005` = c(1L,
3L), `2006` = c(2L, 4L)), class = "data.frame", row.names = c(NA, -2L))
DF2 <- structure(list(countryName = c("Aruba", "Angola"), `2005` = c(11L,
13L), `2006` = c(12L, 14L)), class = "data.frame", row.names = c(NA, -2L))
> DF1
countryName 2005 2006
1 Aruba 1 2
2 Angola 3 4
> DF2
countryName 2005 2006
1 Aruba 11 12
2 Angola 13 14
Upvotes: 2