Find max negative and min positive numbers in a dataframe with negatives and positives

Question

I would like to take the maximum negative value of a column containing negatives and positives (diff_start), and minimum positive value of another column (diff_end) in R.

Data:

data <- read.table(text ="
                   id lab diff_start diff_end
                   1 hb -1.7 -1.8
                   1 hb -0.3 -0.3
                   1 hb 0.6 0.5
                   1 hb 0.7 0.8", header = TRUE)

Desired Output:

# id lab   diff_start diff_end
# 1 hb     -0.3      0.5

What I have done:

Split the data into 2, and filter <= 0 for diff_start and >= 0 for diff_end
Obtain the summaries of interest, and then merge them back

I think this is pretty long and inefficient, and hope to make it more succinct.

full_join(
  data %>% 
    group_by(id, lab) %>% 
    filter(diff_start <= 0) %>% 
    summarise(diff_start = max(diff_start)) %>% 
    ungroup(),
  data %>% 
    group_by(id, lab) %>% 
    filter(diff_start >= 0) %>% 
    summarise(diff_end = min(diff_end)) %>% 
    ungroup())

Clemsang · Accepted Answer

You can factorise your code this way:

data %>% 
  group_by(id, lab) %>% 
  summarise(diff_start = max(diff_start[diff_start <= 0]), diff_end = min(diff_end[diff_end >= 0])) %>% 
  ungroup()
# A tibble: 1 x 4
     id lab   diff_start diff_end
             
1     1 hb          -0.3      0.5

No need to filter first as you can do it in summarize.

To deal with missing negatives or positives:

data %>% 
  group_by(id, lab) %>% 
  summarise(
    diff_start = if(sum(diff_start <= 0) == 0) NA else max(diff_start[diff_start <= 0], na.omit = TRUE),
    diff_end = if(sum(diff_end >= 0) == 0) NA else min(diff_end[diff_end >= 0], na.omit = TRUE)
  ) %>% 
  ungroup()

Find max negative and min positive numbers in a dataframe with negatives and positives

Answers (2)

Related Questions