Reputation: 119
I need a help with sort and delete rows in dataframe.
Example data:
data <- data.frame(
CODE1 = c(111,111,111,666,666,777,777),
CODE2 = c(100,101,102,102,103,103,104),
VALUE = c(3,5,5,3,2,2,1))
I want to remove duplicate rows from column CODE1 and the smallest value from column VALUE.
Result:
output <- data.frame(
CODE1 = c(111,111,666,777),
CODE2 = c(101,102,102,103),
VALUE = c(5,5,3,2))
I used code like this:
data <- data[order(data$CODE1, -data$VALUE),]
data <- data[!duplicated(data$CODE1),]
But this causes a random element to be deleted if the largest VALUE occurs 2 or more times. Does anyone know what can I do?
Upvotes: 1
Views: 71
Reputation: 160407
I don't think you can have "remove duplicate rows from column CODE1" while at the same time only dropping the smallest VALUE.
Try these.
All of them include a check to determine if there is only one value in the group; if a group had just 1, then it would always be filtered out (since min(value) == value
, using the "min" approach).
do.call(rbind, by(dat, dat$code1, function(z) z[nrow(z) == 1L | rank(z$value) > 1,]))
# code1 code2 value
# 111.2 111 101 5
# 111.3 111 102 5
# 666 666 102 3
# 777 777 103 2
Inspired by ThomasIsCoding's use of subset
/ave
, here's an alternative:
subset(dat,
ave(value, code1, FUN = function(z) if (length(z) > 1) rank(z) else 2) > 1)
library(dplyr)
dat %>%
group_by(code1) %>%
filter(n() == 1L | dense_rank(value) > 1) %>%
ungroup()
# # A tibble: 4 x 3
# code1 code2 value
# <chr> <chr> <dbl>
# 1 111 101 5
# 2 111 102 5
# 3 666 102 3
# 4 777 103 2
library(data.table)
DT <- as.data.table(dat)
DT[, .SD[ .N == 1L | frank(value) > 1, ], by = .(code1) ]
# code1 code2 value
# <char> <char> <num>
# 1: 111 101 5
# 2: 111 102 5
# 3: 666 102 3
# 4: 777 103 2
Data
dat <- data.frame(code1=c('111','111','111','666','666','777','777'), code2=c('100','101','102','102','103','103','104'), value=c(3,5,5,3,2,2,1))
Upvotes: 1
Reputation: 101044
A simple base R option with subset
+ ave
subset(
data,
ave(VALUE,CODE1,FUN = min)!=VALUE
)
gives
CODE1 CODE2 VALUE
2 111 101 5
3 111 102 5
4 666 102 3
6 777 103 2
A data.table
option
> setDT(data)[,.SD[VALUE!=min(VALUE)],CODE1]
CODE1 CODE2 VALUE
1: 111 101 5
2: 111 102 5
3: 666 102 3
4: 777 103 2
A dplyr
option
> data %>%
+ group_by(CODE1) %>%
+ filter(min(VALUE)!=VALUE)
# A tibble: 4 x 3
# Groups: CODE1 [3]
CODE1 CODE2 VALUE
<dbl> <dbl> <dbl>
1 111 101 5
2 111 102 5
3 666 102 3
4 777 103 2
Upvotes: 1