Reputation: 241
Hi I have two data frames (df1 and df2) with two shared variables (ID and Yr). I want to update the values in a third variable (value) in df1 with the new data in the respective value in df2. But below code does not update the value in df1, it seems the values are not passed to the corresponding cels in df1.
df1 = data.frame(ID = c("a","b","c","d","e") ,
Yr = c(2000,2001,2002,2003,2004),
value= c(100,100,100,100, 100))
df2 = data.frame(ID = c("a","b","c") ,
Yr = c(2000,2001,2002),
valuenew= c(200,150,120))
for (i in 1:nrow(df2)){
id <- df2[i,'ID']
year <- df2[i, 'Yr']
valuenew<- df2[i, 'valuenew']
df1[which (df1$ID == id & df1$Yr == year), 'value'] <- valuenew
}
the desired result
ID Yr value
a 2000 200
b 2001 150
c 2002 120
d 2003 100
e 2004 100
The real data I use with which none of these solutions works
df1
head(df1, 5)
CoreID Yr FluxTot
1 Asmund2000_Greenland coast_4001 1987 0.3239693
2 Asmund2000_Greenland coast_4001 1986 0.2864100
3 Asmund2000_Greenland coast_4001 1985 0.2488508
4 Asmund2000_Greenland coast_4001 1984 0.2964794
5 Asmund2000_Greenland coast_4001 1983 0.3441080
df2
head(df2, 5)
CoreID Yr GamfitHgdep
1 Beal2015_Mount Logan 2000 0.01105077
2 Eyrikh2017_Belukha glacier 2000 0.02632597
3 Zheng2014_Mt. Oxford 2000 0.01377599
4 Zheng2014_Agassiz 2000 0.01940151
5 Zheng2014_NEEM-2010-S3 2000 -0.01483026
#merged database
m<-merge(df1, df2)
head(m,5)
CoreID Yr FluxTot GamfitHgdep
1 Beal2014_Yanacocha 2000 0.003365556 0.024941373
2 Beal2014_Yanacocha 2001 0.003423333 0.027831253
3 Beal2014_Yanacocha 2002 0.003481111 -0.002908330
4 Beal2014_Yanacocha 2003 0.003538889 -0.004591100
5 Beal2014_Yanacocha 2004 0.003596667 0.005189858
Below is the exact code I used to do the trick but failed. No difference if the value assigning part is replaced with any other solutions. No warning, no error raised.
library(readxl)
library(dplyr)
metal = 'Hg'
df = read_excel('All core data.xlsx','Sheet1')
df = data.frame(df)
df1 <- df[which (df$Metal==metal),]
rownames(df1) = seq(length=nrow(df1))
head(df1, 5)
dfgam = read_excel('GAM prediction.xlsx','Sheet1')
df2 <- data.frame(dfgam)
head(df2, 5)
for (i in 1:nrow(df2)){
coreid <- df2[i,'CoreID']
year <- df2[i, 'Yr']
predicted<- df2[i, 'GamfitHgdep']
df1[which (df1$CoreID == coreid & df1$Yr == year), 'FluxTot'] <- predicted
}
after running the code, the values in df1 have not changed, for instance
the value should be 0.024941373 as shown in head(m,5)
Upvotes: 4
Views: 366
Reputation: 149
Your example is working and updating df1
just fine.
However, to add one more solution, you can try the lines below without using a for loop or attaching extra packages:
key <- paste(df1$ID, df1$Yr)
values <- setNames(df2$value, paste(df2$ID, df2$Yr))[key]
df1$value[!is.na(values)] <- values[!is.na(values)]
Maybe something worth to mention in general for your problem, make sure you don't have any duplicated ID/Yr combinations in df2
...
EDIT:
Sorry, I was terrible at helping you! Providing just another working solution is not helpful at all. So here's my attempt to help you further.
First, check that you have the classes/types that you expect for the columns that you compare.
Next - usually I'd recommend placing a browser()
in your code (e.g. before your assignment/last line in your example:
for (i in 1:nrow(df2)){
id <- df2[i,'ID']
year <- df2[i, 'Yr']
valuenew<- df2[i, 'valuenew']
browser()
df1[which (df1$ID == id & df1$Yr == year), 'value'] <- valuenew
}
This is especially helpful if you need to debug a function. However in your case you can step through your for loop manually, which is a bit simpler to handle:
Assign the first value to your iterator i <- 1
and run the code inside your for loop. Is which(df1$ID == id & df1$Yr == year)
really returning what you expect?
If you can't find any issues, increment i
by 1
and proceed with debugging...
Upvotes: 1
Reputation: 4425
You can try this for
loop
for(i in 1:nrow(df1)){
y <- which(df1$Yr[i] == df2$Yr)
if(length(y) > 0) df1$value[i] <- df2$valuenew[y]
}
ID Yr value
1 a 2000 200
2 b 2001 150
3 c 2002 120
4 d 2003 100
5 e 2004 100
Upvotes: 0
Reputation: 20445
Since dplyr
version 1.0.0, you can use rows_update
for this:
dplyr::rows_update(
df1,
rename(df2, value=valuenew),
by = c("ID", "Yr")
)
# ID Yr value
# 1 a 2000 200
# 2 b 2001 150
# 3 c 2002 120
# 4 d 2003 100
# 5 e 2004 100
Upvotes: 4
Reputation: 41367
Option using data.table
:
df1 = data.frame(ID = c("a","b","c","d","e") ,
Yr = c(2000,2001,2002,2003,2004),
value= c(100,100,100,100, 100))
df2 = data.frame(ID = c("a","b","c") ,
Yr = c(2000,2001,2002),
valuenew= c(200,150,120))
library(data.table)
setDT(df1)[df2, value := i.valuenew, on = .(ID, Yr)]
df1
#> ID Yr value
#> 1: a 2000 200
#> 2: b 2001 150
#> 3: c 2002 120
#> 4: d 2003 100
#> 5: e 2004 100
Created on 2022-07-05 by the reprex package (v2.0.1)
Upvotes: 1
Reputation: 78947
We could use a join for this: For example left_join
library(dplyr)
left_join(df1, df2, by="ID") %>%
mutate(value = ifelse(!is.na(valuenew), valuenew, value)) %>%
select(ID, Yr=Yr.x, value)
ID Yr value
1 a 2000 200
2 b 2001 150
3 c 2002 120
4 d 2003 100
5 e 2004 100
Upvotes: 1