tantan
tantan

Reputation: 57

R: How to mutate new ID by modifying previous ID?

I asked the question(How to mutate a new column by modifying another column?)

Now I have another problem. I have to use more 'untidy'IDs like,

df1 <- data.frame(id=c("A-1","A-10","A-100","b-1","b-10","b-100"),n=c(1,2,3,4,5,6))

from this IDs, I want to assign new 'tidy' IDs like,

df2 <- data.frame(id=c("A0001","A0010","A0100","B0001","B0010","B0100"),n=c(1,2,3,4,5,6))

(now I need capital 'B' instead of 'b')

I tried to use str_pad functiuon, but I couldn't manage.

Upvotes: 0

Views: 157

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389047

We can separate the data into different columns based on "-", convert the letters to uppercase, using sprintf pad with 0's and combine the two columns with unite.

library(dplyr)
library(tidyr)

df1 %>%
  separate(id, c("id1", "id2"), sep = "-") %>%
  mutate(id1 = toupper(id1), 
         id2 = sprintf('%04s', id2)) %>%
  unite(id, id1, id2, sep = "")

#     id n
#1 A0001 1
#2 A0010 2
#3 A0100 3
#4 B0001 4
#5 B0010 5
#6 B0100 6

Based on the comment if there are cases where we don't have separator and we want to change certain id1 values we can use the following.

df1 %>%
  extract(id, c("id1", "id2"), regex = "([:alpha:])-?(\\d+)") %>%
  mutate(id1 = case_when(id1 == 'c' ~ 'B', 
                         TRUE ~ id1), 
         id1 = toupper(id1),id2 = sprintf('%04s', id2)) %>%
  unite(id, id1, id2, sep = "")

Upvotes: 1

nurandi
nurandi

Reputation: 1618

Base R solution

df1$id <- sub("^(.)0+?(.{4})$","\\1\\2", sub("-", "0000", toupper(df1$id)))

tidyverse solution

library(tidyverse)    
df1$id <- str_to_upper(df1$id) %>%
  str_replace("-","0000") %>%
  str_replace("^(.)0+?(.{4})$","\\1\\2")

Output

df1

#      id n
# 1 A0001 1
# 2 A0010 2
# 3 A0100 3
# 4 B0001 4
# 5 B0010 5
# 6 B0100 6

Data

df1 <- data.frame(id=c("A-1","A-10","A-100","b-1","b-10","b-100"),n=c(1,2,3,4,5,6))

Upvotes: 1

Edward
Edward

Reputation: 18823

The str_pad function is handy for this purpose, as you said. But you have to extract out the digits first and then paste it all back together.

library(stringr)

paste0(toupper(str_extract(df1$id, "[aA-zZ]-")), 
      str_pad(str_extract(df1$id, "\\d+"), width=4, pad="0"))

[1] "A-0001" "A-0010" "A-0100" "B-0001" "B-0010" "B-0100"

Upvotes: 1

Related Questions