Mayou
Mayou

Reputation: 8848

Complex subsetting of dataframe

Consider the following dataframe:

df <- data.frame(Asset = c("A", "B", "C"), Historical = c(0.05,0.04,0.03), Forecast = c(0.04,0.02,NA))

#  Asset Historical Forecast
#1     A       0.05     0.04
#2     B       0.04     0.02
#3     C       0.03       NA

as well as the variable x. x is set by the user at the beginning of the R script, and can take two values: either x = "Forecast" or x = "Historical".

If x = "Forecast", I would like to return the following: for each asset, if a forecast is available, return the appropriate number from the column "Forecast", otherwise, return the appropriate number from the column "Historical". As you can see below, both A and B have a forecast value which is returned below. C is missing a forecast value, so the historical value is returned.

   Asset     Return 
 1     A       0.04     
 2     B       0.02     
 3     C       0.03     

If, however, x= "Historical",simply return the Historical column:

   Asset  Historical 
 1     A       0.05     
 2     B       0.04     
 3     C       0.03     

I can't come up with an easy way of doing it, and brute force is very inefficient if you have a large number of rows. Any ideas?

Thanks!

Upvotes: 1

Views: 96

Answers (1)

flodel
flodel

Reputation: 89097

First, pre-process your data:

df2 <- transform(df, Forecast = ifelse(!is.na(Forecast), Forecast, Historical))

Then extract the two columns of choice:

df2[c("Asset", x)]

Upvotes: 4

Related Questions