Phil.He
Phil.He

Reputation: 154

Lowest positive and least negative value among various columns in R?

I have a dataset looking like this:

df <- data.frame(ID=c(1, 1, 1, 2, 3, 3), timeA=c(-10, NA, NA, -15, -10, -5), timeB=c(5, 100, -10, -10, -15, 5), timeC=c(1, 160, 17, -5, -5, 2))

Question 1: I want to create a column giving me the lowest positive value of time for each participant or if all values are negative then keep the negative value in and choose the one that is least negative. Then I want to only choose the lowest positive value for each participant (ID), or when all values are negative, choose the value that is least negative.

Question 2: Is there a function looking for the value that is closest to 0?

So that my output would look like this:

df <- data.frame(ID=c(1,2,3), time_new=c(1, -5, 2))

Upvotes: 0

Views: 259

Answers (2)

Sinh Nguyen
Sinh Nguyen

Reputation: 4487

Simply calculate distance to 0 and then filter

For #1

library(tidyverse)

# function filter check and return a TRUE/FALSE with
# follow logic of #1 - priority positive value first
# if no positive take the maximum negative number
filter_function <- function(x) {
  result <- rep(0, length(x))
  if (all(x < 0, na.rm = TRUE)) {
    reference <- max(x, na.rm = TRUE)
  } else {
    reference <- min(x[x > 0], na.rm = TRUE)
  }
  result <- result + (x == reference)
  result[is.na(result)] <- 0
  
  as.logical(result)
}

# filter as #1 option
df %>% pivot_longer(!ID,values_to = "value") %>% 
  # calculate the distance to ZERO for each value
  mutate(distance_to_zero = 0 + value,
    abs_distance_to_zero = abs(distance_to_zero)) %>%
  group_by(ID) %>% 
  filter(filter_function(distance_to_zero))
#> # A tibble: 3 x 5
#> # Groups:   ID [3]
#>      ID name  value distance_to_zero abs_distance_to_zero
#>   <dbl> <chr> <dbl>            <dbl>                <dbl>
#> 1     1 timeC     1                1                    1
#> 2     2 timeC    -5               -5                    5
#> 3     3 timeC     2                2                    2

And this is for #2

# filter as closest to ZERO no matter positive or negative
df %>% 
  pivot_longer(!ID,values_to = "value") %>% 
  # calculate the distance to ZERO for each value
  mutate(abs_distance_to_zero = abs(0 + value)) %>%
  group_by(ID) %>% 
  # Then filter by the one equal to minimum in each group can return multiple
  # records in your actual data
  filter(abs_distance_to_zero == min(abs_distance_to_zero, na.rm = TRUE) &
          !is.na(abs_distance_to_zero)) %>%
  ungroup()
#> # A tibble: 3 x 4
#>      ID name  value abs_distance_to_zero
#>   <dbl> <chr> <dbl>                <dbl>
#> 1     1 timeC     1                    1
#> 2     2 timeC    -5                    5
#> 3     3 timeC     2                    2

Upvotes: 1

Kat
Kat

Reputation: 18714

I think your looking for Closest() from the library DescTools.

library(tidyverse)
library(DescTools)

# your data
df <- data.frame(ID=c(1, 1, 1, 2, 3, 3), 
                 timeA=c(-10, NA, NA, -15, -10, -5), 
                 timeB=c(5, 100, -10, -10, -15, 5), 
                 timeC=c(1, 160, 17, -5, -5, 2))

# your results
# I stacked the information for easier searching
df %>% pivot_longer(!ID,values_to = "value") %>% 
  group_by(ID) %>% 
  summarise(time_new = Closest(value, 0, na.rm = T)) # closest value to zero

Upvotes: 2

Related Questions