Tony
Tony

Reputation: 149

Conditional Replacement between two dataframes to construct new variable

In my data, I construct synthetic estimates for all observations within a data.frame. However, for some observations, there are observed values that I would like to use instead of the synthetic estimates. Over my real data, the observed information varies depending upon years, crop type, and county. So I am trying to construct something general that could be used to conditionally replace this information depending upon what is actually observed. I've made a trivial example to show you what I mean.

#Ideal Example: It works because everything is in the proper order
set.seed(1234)

df <- data.frame(Name = LETTERS[1:8], Estimated = 5*rnorm(8))
df

alt.df <- data.frame(Name = c('A', 'F'), Observed = 3*runif(2))
alt.df

df$Combined[df$Name %in% alt.df$Name] <- alt.df$Observed
df$Combined[is.na(df$Combined)]  <- df$Estimated[is.na(df$Combined)]
df

#Example doesn't work because the order of alt.df$Name is set as (F, A)
set.seed(1234)

df <- data.frame(Name = LETTERS[1:8], Estimated = 5*rnorm(8))
df

alt.df <- data.frame(Name = c('F', 'A'), Observed = 3*runif(2))
alt.df

#Error is that values fo "F" = 0.85.. is input as value for "A"
df$Combined[df$Name %in% alt.df$Name] <- alt.df$Observed
df$Combined[is.na(df$Combined)]  <- df$Estimated[is.na(df$Combined)]
df

I've struggled through this for the last several days and have looked hard at other Stack Overflow posts including:

Replace a value in a data frame based on a conditional (`if`) statement in R

Changing values in list if that value meets criteria in R

and numerous others.

They have a load of information and I have worked through their examples, but I still cannot figure out how to generalize their solutions to my case where I am not trying to replace a single value, but pulling information from another data set (which might vary) and construct a new variable that merges both the synthetic and observed information into a single variable matched by the identifiers (in the trivial example, the letters). In the trivial example, I've included factors, but I do not have to have factors and actually currently import my data with the option stringsAsFactors = FALSE. So if it is easier without factors, let me know.

I'm sure it's something simple that I'm missing...

Upvotes: 0

Views: 142

Answers (1)

Colonel Beauvel
Colonel Beauvel

Reputation: 31171

For a generic case:

Data

set.seed(1234)

df <- data.frame(Name = LETTERS[1:8], Estimated = 5*rnorm(8))
alt.df <- data.frame(Name = c('A', 'F'), Observed = 3*runif(2))

What you are looking for is basically a merge depending on Name key. This can be done with library data.table:

library(data.table)

setDT(df)
setDT(alt.df)
setkey(alt.df, Name)

dt=alt.df[df]
transform(dt, Combined=ifelse(is.na(dt$Observed), dt$Estimated, dt$Observed))
#   Name  Observed  Estimated    Combined
#1:    A 0.8586699  -6.035329   0.8586699
#2:    B        NA   1.387146   1.3871462
#3:    C        NA   5.422206   5.4222059
#4:    D        NA -11.728489 -11.7284885
#5:    E        NA   2.145623   2.1456234
#6:    F 0.8004623   2.530279   0.8004623
#7:    G        NA  -2.873700  -2.8736998
#8:    H        NA  -2.733159  -2.7331593

Upvotes: 1

Related Questions