HNSKD
HNSKD

Reputation: 1644

How to do row slice for defined groupings in a dataframe while excluding NAs?

I have a dataframe in r as follows:

df<-data.frame(id=rep(1:3,each=2),date=rep(c(NA,"date1","date2"),each=2),value=rep(c(4.1,9.4,5.6,6),c(2,2,1,1)))
#   id  date value
# 1  1  <NA>   4.1
# 2  1  <NA>   4.1
# 3  2 date1   9.4
# 4  2 date1   9.4
# 5  3 date2   5.6
# 6  3 date2   6.0

I would like to remove duplicate rows from the dataframe, such that if there is more than 1 row that has the same id, same date and same value, then I would only take the first row and remove others.

My first instinct is to do the following:

library(dplyr)
df %>% group_by(id,date,value) %>% slice(1)

However, what I want is

#   id  date value
# 1  1  <NA>   4.1
# 2  1  <NA>   4.1
# 3  2 date1   9.4
# 4  3 date2   5.6
# 5  3 date2   6.0

which is to disregard the slice(1) for rows with empty "date".

How can I do this?

Upvotes: 1

Views: 127

Answers (1)

akrun
akrun

Reputation: 887088

We can use filter

library(dplyr)
df %>% 
    filter(is.na(date)|!duplicated(cbind(id, date, value)))
#   id  date value
#1  1  <NA>   4.1
#2  1  <NA>   4.1
#3  2 date1   9.4
#4  3 date2   5.6
#5  3 date2   6.0

Or with slice

df %>%
    group_by(id,date,value) %>%
     slice(unique(c(1, which(is.na(date)))))
# A tibble: 5 x 3
# Groups: id, date, value [4]
#     id   date value
#  <int> <fctr> <dbl>
#1     1     NA   4.1
#2     1     NA   4.1
#3     2  date1   9.4
#4     3  date2   5.6
#5     3  date2   6.0

With base R, we can do this more compactly

df[!duplicated(df) |is.na(df$date),]
#  id  date value
#1  1  <NA>   4.1
#2  1  <NA>   4.1
#3  2 date1   9.4
#5  3 date2   5.6
#6  3 date2   6.0

Upvotes: 1

Related Questions