Mostafa90
Mostafa90

Reputation: 1706

Replace values by the most present value

ID <- c("A", "A", "A", "B", "B", "c")
Value <- c("blue", "blue", "green", "red", "orange", NA)
df <- tibble(ID, Value)

I have to group by the ID, keep the value of Value which is the most repeated If the values are equal (ID == "B"), I choose the first value

The value variable should look like :

Value_output <- c("blue", "blue", "blue", "red", "red", NA)

Upvotes: 1

Views: 62

Answers (3)

akrun
akrun

Reputation: 886938

We can get the Mode by group

library(dplyr)
df %>%
   group_by(ID) %>%
   arrange(ID, is.na(Value)) %>% # in case to keep non- NA elements for a tie
   mutate(Value_output = Mode(Value))

where

 Mode <- function(x) {
   ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
 }

data

ID <- c("A", "A", "A", "B", "B", "c", "c")
Value <- c("blue", "blue", "green", "red", "orange", NA, "yellow")
df <- tibble(ID, Value)

Upvotes: 2

s_baldur
s_baldur

Reputation: 33498

Using base R:

lot <- aggregate(
  Value ~ ID, 
  df, 
  function(x) names(sort(table(x), decreasing=TRUE))[1]
)
df$Value <- lot[match(df$ID, lot$ID), "Value"]
df
  ID    Value 
  <chr> <chr> 
1 A     blue  
2 A     blue  
3 A     blue  
4 B     orange
5 B     orange
6 c     NA   

Upvotes: 2

pogibas
pogibas

Reputation: 28309

Solution with data.table package (count Value by ID).

ID <- c("A", "A", "A", "B", "B", "c")
Value <- c("blue", "blue", "green", "red", "orange", NA)

library(data.table)
foo <- data.table(ID, Value)
setkey(foo, ID)
foo[foo[, .N, .(ID, Value)][order(N, decreasing = TRUE)][, .(Value = Value[1]), ID]]$i.Value
[1] "blue" "blue" "blue" "red"  "red"  NA    

Upvotes: 2

Related Questions