HNSKD
HNSKD

Reputation: 1644

Find max negative and min positive numbers in a dataframe with negatives and positives

I would like to take the maximum negative value of a column containing negatives and positives (diff_start), and minimum positive value of another column (diff_end) in R.

Data:

data <- read.table(text ="
                   id lab diff_start diff_end
                   1 hb -1.7 -1.8
                   1 hb -0.3 -0.3
                   1 hb 0.6 0.5
                   1 hb 0.7 0.8", header = TRUE)

Desired Output:

# id lab   diff_start diff_end
# 1 hb     -0.3      0.5

What I have done:

I think this is pretty long and inefficient, and hope to make it more succinct.

full_join(
  data %>% 
    group_by(id, lab) %>% 
    filter(diff_start <= 0) %>% 
    summarise(diff_start = max(diff_start)) %>% 
    ungroup(),
  data %>% 
    group_by(id, lab) %>% 
    filter(diff_start >= 0) %>% 
    summarise(diff_end = min(diff_end)) %>% 
    ungroup())

Upvotes: 1

Views: 1905

Answers (2)

Clemsang
Clemsang

Reputation: 5481

You can factorise your code this way:

data %>% 
  group_by(id, lab) %>% 
  summarise(diff_start = max(diff_start[diff_start <= 0]), diff_end = min(diff_end[diff_end >= 0])) %>% 
  ungroup()
# A tibble: 1 x 4
     id lab   diff_start diff_end
  <int> <fct>      <dbl>    <dbl>
1     1 hb          -0.3      0.5

No need to filter first as you can do it in summarize.

To deal with missing negatives or positives:

data %>% 
  group_by(id, lab) %>% 
  summarise(
    diff_start = if(sum(diff_start <= 0) == 0) NA else max(diff_start[diff_start <= 0], na.omit = TRUE),
    diff_end = if(sum(diff_end >= 0) == 0) NA else min(diff_end[diff_end >= 0], na.omit = TRUE)
  ) %>% 
  ungroup()

Upvotes: 2

m.k.
m.k.

Reputation: 332

Give this a go:

max(data$diff_start[data$diff_start < 0]) 
min(data$diff_end[data$diff_end > 0])

Result:

> max(data$diff_start[data$diff_start < 0]) 
[1] -0.3
> min(data$diff_end[data$diff_end > 0])
[1] 0.5

Edit:

To maintain the grouping you can use:

by(data, list(data$id, data$lab), function(x) {
    c(max(x$diff_start[x$diff_start < 0]),
    min(x$diff_end[x$diff_end > 0]))
})

Output

[1] -0.3  0.5

Upvotes: 4

Related Questions