Reputation: 3
I have two dataframes. The first, df1 contains countries and years. The second, df2, contains data that I want to include in df1 in a third column based on the match of the rows and columns of the respective values of df1.
df1
country year
1 A 2008
2 B 2008
3 C 2009
4 F 2004
5 E 2006
df2
country 2004 2005 2006 2007 2008 2009
1 A 3,74972737 3,69814069 1,8119572 2,0058797 2,3728207 3,63424962
2 B 3,62151043 1,54726382 -3,799075 1,92867306 2,92279764 0,68044437
3 C 25,0489995 10,7724208 9,41065376 4,85433932 0,06592277 2,20000019
4 F 4,78583195 5,04811878 3,46842543 3,78590254 4,19162568 4,01936553
5 E 3,44897379 0,78317304 -2,2531746 2,74421327 1,79830266 0,23479692
6 F 5,98651552 4,89339392 2,31922692 2,11685013 2,96275035 4,81028341
7 G 5,65500512 7,29449815 2,96201437 5,37337313 6,62686519 6,45269876
8 H 7,05863621 6,01378976 5,04512479 5,57180227 6,46438388 6,52143508
9 I 7,67535068 3,63781612 -3,5861456 1,32402682 1,91501801 0,03094361
This is what I want to achieve:
country year gdp
1 A 2008 2.372821
2 B 2008 2.922798
3 C 2009 2.200000
4 F 2004 5.986516
5 E 2006 -2.253175
I am sure there is a very simple answer to this problem. How can I bring the data of df2 to df1?
I tried to use dplyr:mutate
achieve it:
library(dplyr)
df1 <- mutate(df1, gdp = {
df2[which(df2$country == country),
which(colnames(df2) == year)]})
However, the following Error message comes up
Error in which(colnames(df2) == year) : object 'year' not found
Upvotes: 0
Views: 2219
Reputation: 39154
A solution using dplyr
and tidyr
. The key is to convert df2
to long format using gather
. After that, we can conduct a merge operation with left_join
. The last mutate
call could be unnecessary if ,
in your data frame are all .
. df3
is the final output.
library(dplyr)
library(tidyr)
df3 <- df1 %>%
left_join(df2 %>% gather(year, gdp, -country, convert = TRUE),
by = c("country", "year")) %>%
mutate(gdp = as.numeric(sub(",", "\\.", gdp)))
df3
# country year gdp
# 1 A 2008 2.372821
# 2 B 2008 2.922798
# 3 C 2009 2.200000
# 4 F 2004 4.785832
# 5 F 2004 5.986516
# 6 E 2006 -2.253175
DATA
df1 <- read.table(text = "country year
1 A 2008
2 B 2008
3 C 2009
4 F 2004
5 E 2006",
header = TRUE, stringsAsFactors = FALSE)
df2 <- read.table(text = " country 2004 2005 2006 2007 2008 2009
1 A 3,74972737 3,69814069 1,8119572 2,0058797 2,3728207 3,63424962
2 B 3,62151043 1,54726382 -3,799075 1,92867306 2,92279764 0,68044437
3 C 25,0489995 10,7724208 9,41065376 4,85433932 0,06592277 2,20000019
4 F 4,78583195 5,04811878 3,46842543 3,78590254 4,19162568 4,01936553
5 E 3,44897379 0,78317304 -2,2531746 2,74421327 1,79830266 0,23479692
6 F 5,98651552 4,89339392 2,31922692 2,11685013 2,96275035 4,81028341
7 G 5,65500512 7,29449815 2,96201437 5,37337313 6,62686519 6,45269876
8 H 7,05863621 6,01378976 5,04512479 5,57180227 6,46438388 6,52143508
9 I 7,67535068 3,63781612 -3,5861456 1,32402682 1,91501801 0,03094361",
header = TRUE, stringsAsFactors = FALSE)
names(df2) <- c("country", 2004:2009)
Upvotes: 3