Luke Harold
Luke Harold

Reputation: 49

Keeping one row and discarding others in R using specific criteria?

I'm working with the data frame below, which is just part of the full data, and I need to condense the duplicate numbers in the id column into one row. I want to preserve the row that has the highest sbp number, unless it's 300 or over, in which case I want to discard that too.

So for example, for the first three rows that have id as 13480, I want to keep the row that has 124 and discard the other two.

id,sex,visits,sbp    
13480,M,2,124
13480,M,3,306
13480,M,4,116
13520,M,2,124
13520,M,3,116
13520,M,4,120
13580,M,2,NA
13580,M,3,124

This is the farthest I got, been trying to tweak this but not sure I'm on the right track:

maxsbp <- split(sbp, sbp$sbp)
  r <- data.frame()
  for (i in 1:length(maxsbp)){
    one <- maxsbp[[i]]
    index <- which(one$sbp == max(one$sbp))
    select <- one[index,]
    r <- rbind(r, select) 
}
  r1 <- r[!(sbp$sbp>=300),]
  r1

Upvotes: 0

Views: 45

Answers (3)

AnilGoyal
AnilGoyal

Reputation: 26218

slice_head can also be used

my.df <- data.frame("id" = c(13480,13480,13480,13520,13520,13520,13580,13580),
                    "sex" = c("M","M","M","M","M","M","M","M"),
                    "sbp"= c(124,306,116,124,116,120,NA,124))

> my.df
     id sex sbp
1 13480   M 124
2 13480   M 306
3 13480   M 116
4 13520   M 124
5 13520   M 116
6 13520   M 120
7 13580   M  NA
8 13580   M 124

Proceed simply like this


my.df %>% group_by(id, sex) %>%
  arrange(desc(sbp)) %>%
  slice_head() %>%
  filter(sbp <300)

# A tibble: 2 x 3
# Groups:   id, sex [2]
     id sex     sbp
  <dbl> <chr> <dbl>
1 13520 M       124
2 13580 M       124

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388862

In R, very rarely you'll require explicit for loops to do tasks.

There are functions available which will help you perform such grouped operations.

For example, in base R you can use subset and ave :

subset(df,sbp == ave(sbp,id,FUN = function(x) max(sbp[sbp <= 300],na.rm = TRUE)))

#     id sex visits sbp
#1 13480   M      2 124
#4 13520   M      2 124
#8 13580   M      3 124

The same can be done using dplyr whose syntax is a little bit easier to understand.

library(dplyr)
df %>%
  group_by(id) %>%
  filter(sbp == max(sbp[sbp <= 300], na.rm = TRUE))

Upvotes: 1

Lorenzo Bertola
Lorenzo Bertola

Reputation: 23

I think a tidy solution to this would work quite well. I would first filter all values above 300, if you do not want to keep any value above that threshold. Then group_by id, order, and keep the first.

my.df <- data.frame("id" = c(13480,13480,13480,13520,13520,13520,13580,13580),
                       "sex" = c("M","M","M","M","M","M","M","M"),
                       "sbp"= c(124,306,116,124,116,120,NA,124))
my.df %>% filter(sbp < 300) # filter to retain only values below 300
      %>% group_by(id) # group by id
      %>% arrange(-sbp) # arrange by id in descending order
      %>% top_n(1, sbp) # retain first value i.e. the largest

# A tibble: 3 x 3
# Groups:   id [3]
#     id sex     sbp
#  <dbl> <chr> <dbl>
#1 13480 M       124
#2 13520 M       124
#3 13580 M       124

Upvotes: 1

Related Questions