Reputation: 137

Organizing a data frame with multiple entries per sample

I have the following database with several entries per individual:

record_id<-c(21,21,21,15,15,15,2,2,2,2,3,3,3)
var<-c(0,0,0,1,0,0,1,1,0,0,1,1,0)
data<-data.frame(cbind(record_id,var))

I want to create a new data frame with just 1 row per record_id. But it has to fulfill that if the individual (record_id) has a data$var == 1. The outcome data frame must indicate 1.

So, the outcome would be like this:

record_id<-c(21,15,2,3)
var<-c(0,1,1,1)
data_sol<-data.frame(cbind(record_id,var))

I have tried this:

DF1 <- data %>%
  group_by(record_id) %>% 
  mutate(class = ifelse(var==1,1,0)) %>%
  ungroup

I know it's not the best way, I was planning to obtain afterwards the unique values... But it did not make the trick.

Upvotes: 0

Answers (3)

akrun

Reputation: 887068

We can do

library(dplyr)
data %>%
    group_by(record_id) %>%
    summarise(var = +(mean(var) != 0))

Or using slice

data %>%
  group_by(record_id) %>%
  slice_max(n = 1, order_by = var)

Upvotes: 1

GuedesBF

Reputation: 9858

If your 'var' is all zeroes or ones, you can also use max():

data%>%group_by(record_id)%>%
        summarise(new_var=max(var))

# A tibble: 4 x 2
  record_id new_var
      <dbl>   <dbl>
1         2       1
2         3       1
3        15       1
4        21       0

Upvotes: 2

maydin

Reputation: 3755

You can use mean() with the mutate to detect if there exsist any non zero value inside a group like,

data %>%
  group_by(record_id) %>% 
  mutate(var = ifelse(mean(var)!=0,1,0)) %>%
  distinct(record_id,var)

gives,

# A tibble: 4 x 2
# Groups:   record_id [4]
#       record_id   var
#           <dbl> <dbl>
#     1        21     0
#     2        15     1
#     3         2     1
#     4         3     1

Upvotes: 1

Organizing a data frame with multiple entries per sample

Answers (3)

Related Questions