Oksana
Oksana

Reputation: 9

If() statement in R

I am not very experienced in if statements and loops in R.

Probably you can help me to solve my problem.

My task is to add +1 to df$fz if sum(df$fz) < 450, but in the same time I have to add +1 only to max values in df$fz till that moment when when sum(df$fz) is lower than 450

Here is my df

ID_PP <- c(3,6, 22, 30, 1234456)
z <- c(12325, 21698, 21725, 8378, 18979)
fz <- c(134, 67, 70, 88, 88)

df <- data.frame(ID_PP,z,fz)

After mutating the new column df$new_value, it should look like 134 68 71 88 89

At this moment I have this code, but it adds +1 to all values.

if (sum(df$fz ) < 450) {
  mutate(df, new_value=fz+1)
 }

I know that I can pick top_n(3, z) and add +1 only to this top, but it is not what I want, because in that case I have to pick a top manually after checking sum(df$fz)

Upvotes: 0

Views: 111

Answers (2)

Andrew
Andrew

Reputation: 5138

The clarifications in the comments helped. Let me know if this works for you. Of course, you can drop the cumsum_fz and leftover columns.

# Making variables to use in the calculation
df <- df %>%
  arrange(fz) %>%
  mutate(cumsum_fz = cumsum(fz),
         leftover = 450 - cumsum_fz)

# Find the minimum, non-negative value to use for select values that need +1
min_pos <- min(df$leftover[df$leftover > 0])

# Creating a vector that adds 1 using the min_pos value and keeps
# the other values the same
df$new_value <- c((head(sort(df$fz), min_pos) + 1), tail(sort(df$fz), length(df$fz) - min_pos))

# Checking the sum of the new value
> sum(df$new_value)
[1] 450
> 
> df
    ID_PP     z  fz cumsum_fz leftover new_value
1       6 21698  67        67      383        68
2      22 21725  70       137      313        71
3      30  8378  88       225      225        89
4 1234456 18979  88       313      137        88
5       3 12325 134       447        3       134

EDIT:

Because utubun already posted a great tidyverse solution, I am going to translate my first one completely to base (it was a bit sloppy to mix the two anyway). Same logic as above, and using the data OP provided.

 > # Using base
> df <- df[order(fz),]
> 
> leftover <- 450 - cumsum(fz)
> min_pos <- min(leftover[leftover > 0])
> df$new_value <- c((head(sort(df$fz), min_pos) + 1), tail(sort(df$fz), length(df$fz) - min_pos))
> 
> sum(df$new_value)
[1] 450
> df
    ID_PP     z  fz new_value
2       6 21698  67        68
3      22 21725  70        71
4      30  8378  88        89
5 1234456 18979  88        88
1       3 12325 134       134

Upvotes: 1

utubun
utubun

Reputation: 4520

From what I understood from @Oksana's question and comments, we probably can do it this way:

library(tidyverse)

# data
vru <- data.frame(
  id = c(3, 6, 22, 30, 1234456),
  z  = c(12325, 21698, 21725, 8378, 18979),
  fz = c(134, 67, 70, 88, 88)
)

# solution
vru %>%                             #
  top_n(450 - sum(fz), z) %>%       # subset by top z, if sum(fz) == 450 -> NULL
  mutate(fz = fz + 1) %>%           # increase fz by 1 for the subset
  bind_rows(                        #
    anti_join(vru, ., by = "id"),   # take rows from vru which are not in subset
    .                               # take subset with transformed fz 
  ) %>%                             # bind thous subsets
  arrange(id)                       # sort rows by id

# output
       id     z  fz
1       3 12325 134
2       6 21698  68
3      22 21725  71
4      30  8378  88
5 1234456 18979  89

Upvotes: 1

Related Questions