Reputation: 33
I had a dataframe, whose ID column had many duplicated names. So I used table() function to get the frequency of IDs. like this:
library(dplyr)
id <- runif(1000,1000,3000) %>% round() %>% as.character()
freq <- rep(1:50,20)
data <- data.frame(id,freq)
GetID <- function(a){
if (a[2]==1) newid <- a[1] else newid <- paste(a[1],1:a[2],sep = "-");
return(newid)}
idlist <- data %>% apply(., 1, GetID)
idlist2 <- unlist(idlist) %>% as.data.frame()
I wanted to get a new ID vector. If the freq equals 1, the new ID equals the old one. If the freq is larger than 1, the new ID is the old id combined with its order. However, it seems the if statement didn't work correctly. All of new id had order number.
Upvotes: 0
Views: 284
Reputation: 77
do you have to use a function? if not:
id <- runif(1000,1000,3000)
freq <- rep(1:50,20)
num <- 1:length(id)
data <- data.frame(num,id,freq)
data2 <- data %>% filter(freq == 1) %>% mutate(newid = id)
data3 <- data %>% filter(freq != 1) %>% mutate(newid = paste(id,freq,sep = "-"))
result <- rbind(data2,data3) %>% arrange(num)
Upvotes: 1
Reputation: 388982
You can group_by
id
and if number of rows is greater than 1 then paste row_number()
with id
or just use id
.
library(dplyr)
data %>%
group_by(id) %>%
mutate(newID = if(n() > 1) paste(id, row_number(), sep = '-')
else as.character(id)) %>%
arrange(id)
# id freq newID
# <chr> <int> <chr>
# 1 1002 49 1002-1
# 2 1002 31 1002-2
# 3 1003 26 1003
# 4 1005 11 1005-1
# 5 1005 28 1005-2
# 6 1007 37 1007
# 7 1013 33 1013
# 8 1016 7 1016
# 9 1020 11 1020
#10 1024 28 1024
# … with 990 more rows
Upvotes: 0