chippycentra
chippycentra

Reputation: 3432

Add number index into a row for each unique value in another column in R

I have the following data frame, df:

Names;Sp 
Unknown;SP1
Unknown;SP1
Unknown;SP1
Unknown;SP2
Unknown;SP2
Unknown;SP3
Unknown;SP4
OK;SP4
OK;SP5
Unknown;SPA
Unknown;SPB
Unknown;SP1

I would like to add a "_number" suffix to each occurrence of "Unknown" in df$Names, and I want a different number for each unique combination of Unknown and df$Sp.

Here is the desired output:

Names;Sp 
Unknown_1;SP1
Unknown_1;SP1
Unknown_1;SP1
Unknown_2;SP2
Unknown_2;SP2
Unknown_3;SP3
Unknown_4;SP4
OK;SP4
OK;SP5
Unknown_5;SPA
Unknown_6;SPB
Unknown_1;SP1

Here are the data:

structure(list(Names = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
1L, 1L, 2L, 2L, 2L), .Label = c("OK", "Unknown"), class = "factor"), 
    Sp = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 4L, 4L, 5L, 6L, 
    7L, 1L), .Label = c("SP1", "SP2", "SP3", "SP4", "SP5", "SPA", 
    "SPB"), class = "factor")), class = "data.frame", row.names = c(NA, 
-12L))

Upvotes: 0

Views: 248

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388817

Another data.table approach :

library(data.table)
setDT(df)[Names == "Unknown", 
          Names := paste(Names, match(Sp, unique(Sp)), sep = '_')]
df

#        Names  Sp
# 1: Unknown_1 SP1
# 2: Unknown_1 SP1
# 3: Unknown_1 SP1
# 4: Unknown_2 SP2
# 5: Unknown_2 SP2
# 6: Unknown_3 SP3
# 7: Unknown_4 SP4
# 8:        OK SP4
# 9:        OK SP5
#10: Unknown_5 SPA
#11: Unknown_6 SPB
#12: Unknown_1 SP1

dplyr translation isn't that pretty :

library(dplyr)

df %>%
  filter(Names == 'Unknown') %>%
  mutate(Names = paste(Names, match(Sp, unique(Sp)), sep = '_')) %>%
  bind_rows(df %>% filter(Names != 'Unknown')) 

Upvotes: 1

Lennyy
Lennyy

Reputation: 6132

library(tidyverse)
df %>% 
  mutate(Names = ifelse(Names == "Unknown", 
                        paste0("Unknown_", 1 + cumsum(Names == "Unknown" & !Sp == lag(Sp, default = first(Sp)))),
                        as.character(Names)))
       Names  Sp
1  Unknown_1 SP1
2  Unknown_1 SP1
3  Unknown_1 SP1
4  Unknown_2 SP2
5  Unknown_2 SP2
6  Unknown_3 SP3
7  Unknown_4 SP4
8         OK SP4
9         OK SP5
10 Unknown_5 SPA
11 Unknown_6 SPB
12 Unknown_7 SP1

Upvotes: 2

B. Christian Kamgang
B. Christian Kamgang

Reputation: 6489

Using data.table package, you could solve your problem as follows (assuming your data.frame is named df):

library(data.table)

setDT(df)[Names == "Unknown", Names := paste0("Unknown_", .GRP), by=Sp]

#         Names     Sp
#  1: Unknown_1    SP1
#  2: Unknown_1    SP1
#  3: Unknown_1    SP1
#  4: Unknown_2    SP2
#  5: Unknown_2    SP2
#  6: Unknown_3    SP3
#  7: Unknown_4    SP4
#  8:        OK    SP4
#  9:        OK    SP5
# 10: Unknown_5    SPA
# 11: Unknown_6    SPB
# 12: Unknown_1    SP1

Upvotes: 5

Related Questions