Reputation: 1106
My dataset includes 3 different types of values, two of them have dashes.
df=c("20001982-02", "19933626-02", "20024861-6", "29114-1", "20109774-02",
"19965663-01", "19992655-01", "20087008-08", "140107", "20032011-09",
"139")
I need to add leading zeroes to the values that have a dash so they match pattern XXXXXXXX-XX
df.new =c("20001982-02", "19933626-02", "20024861-06", "00029114-01",
"20109774-02", "19965663-01", "19992655-01", "20087008-08", "140107", "20032011-09", "139")
So far i have this but only does part of the job (see 3rd element as i need it to be 00029114-01)
sub("^(\\d{8})-(\\d)$", "\\1-0\\2", df)
df.new = c("20001982-02", "19933626-02", "20024861-06", "29114-1", "20109774-02",
"19965663-01", "19992655-01", "20087008-08", "140107", "20032011-09",
"139")
Upvotes: 0
Views: 293
Reputation: 21400
This should work:
library(stringr)
df1 <- sub("-(\\d$)", "-0\\1", df)
df2 <- ifelse(grepl("-\\d", df1),
str_pad(df1, width = 11, side = "left", pad = "0"),
df1)
[1] "20001982-02" "19933626-02" "20024861-06" "00029114-01" "20109774-02" "19965663-01" "19992655-01" "20087008-08"
[9] "140107" "20032011-09" "139"
Upvotes: 1
Reputation: 887088
We can use grepl
with sprintf
from base R
. Split the dataset at -
with read.table
, use sprintf
to join back into a single string specifying the fmt
for adding the leading zeros, create the condition in ifelse
to return that new format when there is -
or else the old one
out <- ifelse(grepl('-', df), do.call(sprintf, c(fmt = '%08d-%02d',
read.table(text = df, header = FALSE, sep="-", fill = TRUE))), df)
identical(df.new, out)
#[1] TRUE
Upvotes: 2
Reputation: 6628
stingr::pad()
is great for this.
library(stringr)
df_dash <- df.new[grepl("-", df.new)]
str_pad(df_dash, width = 11, pad = 0)
#> [1] "20001982-02" "19933626-02" "20024861-06" "00029114-01" "20109774-02"
#> [6] "19965663-01" "19992655-01" "20087008-08" "20032011-09"
Wrap it in ifelse()
to return the original character vector with intended modifications.
ifelse(grepl("-", df.new), str_pad(df.new, width = 11, pad = 0), df.new)
#> [1] "20001982-02" "19933626-02" "20024861-06" "00029114-01" "20109774-02"
#> [6] "19965663-01" "19992655-01" "20087008-08" "140107" "20032011-09"
#> [11] "139"
split_strings <- strsplit(df_dash, "-")
sapply(split_strings, function(x) paste(str_pad(x[1], width = 8, side = "right", pad = 0), x[2], sep = "-"))
#> [1] "20001982-02" "19933626-02" "20024861-06" "29114000-01" "20109774-02"
#> [6] "19965663-01" "19992655-01" "20087008-08" "20032011-09"
Upvotes: 0