Reputation: 57
I asked the question(How to mutate a new column by modifying another column?)
Now I have another problem. I have to use more 'untidy'IDs like,
df1 <- data.frame(id=c("A-1","A-10","A-100","b-1","b-10","b-100"),n=c(1,2,3,4,5,6))
from this IDs, I want to assign new 'tidy' IDs like,
df2 <- data.frame(id=c("A0001","A0010","A0100","B0001","B0010","B0100"),n=c(1,2,3,4,5,6))
(now I need capital 'B' instead of 'b')
I tried to use str_pad functiuon, but I couldn't manage.
Upvotes: 0
Views: 157
Reputation: 389047
We can separate the data into different columns based on "-"
, convert the letters to uppercase, using sprintf
pad with 0's and combine the two columns with unite
.
library(dplyr)
library(tidyr)
df1 %>%
separate(id, c("id1", "id2"), sep = "-") %>%
mutate(id1 = toupper(id1),
id2 = sprintf('%04s', id2)) %>%
unite(id, id1, id2, sep = "")
# id n
#1 A0001 1
#2 A0010 2
#3 A0100 3
#4 B0001 4
#5 B0010 5
#6 B0100 6
Based on the comment if there are cases where we don't have separator and we want to change certain id1
values we can use the following.
df1 %>%
extract(id, c("id1", "id2"), regex = "([:alpha:])-?(\\d+)") %>%
mutate(id1 = case_when(id1 == 'c' ~ 'B',
TRUE ~ id1),
id1 = toupper(id1),id2 = sprintf('%04s', id2)) %>%
unite(id, id1, id2, sep = "")
Upvotes: 1
Reputation: 1618
Base R solution
df1$id <- sub("^(.)0+?(.{4})$","\\1\\2", sub("-", "0000", toupper(df1$id)))
tidyverse solution
library(tidyverse)
df1$id <- str_to_upper(df1$id) %>%
str_replace("-","0000") %>%
str_replace("^(.)0+?(.{4})$","\\1\\2")
Output
df1
# id n
# 1 A0001 1
# 2 A0010 2
# 3 A0100 3
# 4 B0001 4
# 5 B0010 5
# 6 B0100 6
Data
df1 <- data.frame(id=c("A-1","A-10","A-100","b-1","b-10","b-100"),n=c(1,2,3,4,5,6))
Upvotes: 1
Reputation: 18823
The str_pad
function is handy for this purpose, as you said. But you have to extract out the digits first and then paste it all back together.
library(stringr)
paste0(toupper(str_extract(df1$id, "[aA-zZ]-")),
str_pad(str_extract(df1$id, "\\d+"), width=4, pad="0"))
[1] "A-0001" "A-0010" "A-0100" "B-0001" "B-0010" "B-0100"
Upvotes: 1