kimi
kimi

Reputation: 525

Lapply with ifelse condition

I've a dataframe which contains 40 rows and 10 columns. I want to derive a column if index of rows less than 17, new column is called Prediction equals to 'Value' column. What I mean is that:

ID  Year    Value
1   2016    114235
2   2016    114235
3   2016    114235
4   2016    114235
5   2016    114235

Then:

ID  Year    Value   Prediction
1   2016    114235  114235
2   2016    114235  114235
3   2016    114235  114235
4   2016    114235  114235
5   2016    114235  114235

I tried to code as follows and all rowns of new column was 'NA'.

newdata$Prediction <- ifelse(nrow(newdata) <= 17, newdata$Value, NA)

newdata$Prediction <- lapply(newdata, function(x) ifelse(nrow(newdata) <= 17, newdata$Value, NA))

It doesn't work. How can I do that?

Upvotes: 0

Views: 999

Answers (3)

Aramis7d
Aramis7d

Reputation: 2496

Instead of using

newdata$Prediction <- ifelse(nrow(newdata) <= 17, newdata$Value, NA)

you can use something like

newdata$Prediction <- ifelse(as.numeric(rownames(newdata)) <= 17, newdata$Value, NA)

The difference here is in understanding how nrow() and rownames() work.

for e.g., taking a threshold of 3 , your sample input returns

  ID Year  Value Prediction
1  1 2016 114235     114235
2  2 2016 114235     114235
3  3 2016 114235     114235
4  4 2016 114235         NA
5  5 2016 114235         NA

While the methods mentioned in the comments to your question are perfectly valid, I still post this because your attempt wasn't too far off.


Alternatively, you could also try using tidyverse functions:

newdata %>%
  mutate(rn = 1:n()) %>%
  mutate(Prediction = if_else(rn <= 3, Value, NULL)) %>%
  select(-rn)

Upvotes: 2

StephenK
StephenK

Reputation: 695

I think that you can just change one small thing in your code to get what you want.

newdata$Prediction <- ifelse(newdata$ID <= 17, newdata$Value, NA)

You already have an ID column that appears to be sorted and thus is acting like the row number. nrow() will just give you the number of rows, and in your case the number of rows for your dataset is greater than 17 so you get NAs on every row.

Upvotes: 2

SeGa
SeGa

Reputation: 9809

Do you need the lapply function?

You could just do something like this:

nrow = 20
newdata <- data.frame(ID = 1:nrow,
           Year = rep(2016, nrow),
           Value = rep(114235, nrow))


newdata$Prediction <- newdata$Value
if (nrow(newdata) > 17) {
  newdata[17:nrow(newdata),]$Prediction <- NA
}

newdata

Like this, it wont change the data if less than 17 rows are at hand. Otherwise it would add new rows and fill them with NA.

Upvotes: 2

Related Questions