Reputation: 3432
I have the following data frame, df
:
Names;Sp
Unknown;SP1
Unknown;SP1
Unknown;SP1
Unknown;SP2
Unknown;SP2
Unknown;SP3
Unknown;SP4
OK;SP4
OK;SP5
Unknown;SPA
Unknown;SPB
Unknown;SP1
I would like to add a "_number
" suffix to each occurrence of "Unknown" in df$Names
, and I want a different number for each unique combination of Unknown
and df$Sp
.
Here is the desired output:
Names;Sp
Unknown_1;SP1
Unknown_1;SP1
Unknown_1;SP1
Unknown_2;SP2
Unknown_2;SP2
Unknown_3;SP3
Unknown_4;SP4
OK;SP4
OK;SP5
Unknown_5;SPA
Unknown_6;SPB
Unknown_1;SP1
Here are the data:
structure(list(Names = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
1L, 1L, 2L, 2L, 2L), .Label = c("OK", "Unknown"), class = "factor"),
Sp = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 4L, 4L, 5L, 6L,
7L, 1L), .Label = c("SP1", "SP2", "SP3", "SP4", "SP5", "SPA",
"SPB"), class = "factor")), class = "data.frame", row.names = c(NA,
-12L))
Upvotes: 0
Views: 248
Reputation: 388817
Another data.table
approach :
library(data.table)
setDT(df)[Names == "Unknown",
Names := paste(Names, match(Sp, unique(Sp)), sep = '_')]
df
# Names Sp
# 1: Unknown_1 SP1
# 2: Unknown_1 SP1
# 3: Unknown_1 SP1
# 4: Unknown_2 SP2
# 5: Unknown_2 SP2
# 6: Unknown_3 SP3
# 7: Unknown_4 SP4
# 8: OK SP4
# 9: OK SP5
#10: Unknown_5 SPA
#11: Unknown_6 SPB
#12: Unknown_1 SP1
dplyr
translation isn't that pretty :
library(dplyr)
df %>%
filter(Names == 'Unknown') %>%
mutate(Names = paste(Names, match(Sp, unique(Sp)), sep = '_')) %>%
bind_rows(df %>% filter(Names != 'Unknown'))
Upvotes: 1
Reputation: 6132
library(tidyverse)
df %>%
mutate(Names = ifelse(Names == "Unknown",
paste0("Unknown_", 1 + cumsum(Names == "Unknown" & !Sp == lag(Sp, default = first(Sp)))),
as.character(Names)))
Names Sp
1 Unknown_1 SP1
2 Unknown_1 SP1
3 Unknown_1 SP1
4 Unknown_2 SP2
5 Unknown_2 SP2
6 Unknown_3 SP3
7 Unknown_4 SP4
8 OK SP4
9 OK SP5
10 Unknown_5 SPA
11 Unknown_6 SPB
12 Unknown_7 SP1
Upvotes: 2
Reputation: 6489
Using data.table
package, you could solve your problem as follows (assuming your data.frame
is named df
):
library(data.table)
setDT(df)[Names == "Unknown", Names := paste0("Unknown_", .GRP), by=Sp]
# Names Sp
# 1: Unknown_1 SP1
# 2: Unknown_1 SP1
# 3: Unknown_1 SP1
# 4: Unknown_2 SP2
# 5: Unknown_2 SP2
# 6: Unknown_3 SP3
# 7: Unknown_4 SP4
# 8: OK SP4
# 9: OK SP5
# 10: Unknown_5 SPA
# 11: Unknown_6 SPB
# 12: Unknown_1 SP1
Upvotes: 5