hubert_farnsworth
hubert_farnsworth

Reputation: 797

How can we concatenate variables and add leading zeros in R?

Suppose I am interested in concatenating two variables. I start with a dataset like this:

#what I have
A <- rep(paste("125"),50)
B <- rep(paste("48593"),50)
C <- rep(paste("99"),50)
D <- rep(paste("1233"),50)

one <- append(A,C)
two <- append(B,D)

have <- data.frame(one,two); head(have)
  one   two
1 125 48593
2 125 48593
3 125 48593
4 125 48593
5 125 48593
6 125 48593

A straightforward paste command does the trick:

#half way there
half <- paste(one,two,sep="-");head(half)
[1] "125-48593" "125-48593" "125-48593" "125-48593" "125-48593" "125-48593"

But I actually want a dataset that looks like this:

#what I desire
E <- rep(paste("00125"),50)
F <- rep(paste("0048593"),50)
G <- rep(paste("00099"),50)
H <- rep(paste("0001233"),50)

three <- append(E,G)
four <- append(F,H)

desire <- data.frame(three,four); head(desire)
  three    four
1 00125 0048593
2 00125 0048593
3 00125 0048593
4 00125 0048593
5 00125 0048593
6 00125 0048593

So that the straightforward paste command produces this :

#but what I really want
there <-  paste(three,four,sep="-");head(there)
[1] "00125-0048593" "00125-0048593" "00125-0048593" "00125-0048593"
[5] "00125-0048593" "00125-0048593"

That is, I want the concatenation to have five digits for the first part and 7 digits for the second part with leading zeros applied when applicable.

Should I first transform the dataset to add the leading zeros and then do the paste command? Or can I do it all within the same line of code? I put a data.table() tag because I'm sure there is a very efficient solution there that I'm simply not aware of.

test solution provided by @joran:

one <- sprintf("%05s",one)
two <- sprintf("%07s",two)
have <- data.frame(one,two); head(have)
    one     two
00125 0048593
00125 0048593
00125 0048593
00125 0048593
00125 0048593
00125 0048593
desire <- data.frame(three,four); head(desire)
  three    four
00125 0048593
00125 0048593
00125 0048593
00125 0048593
00125 0048593
00125 0048593

identical(have$one,desire$three)
[1] TRUE
identical(have$two,desire$four)
[1] TRUE

Upvotes: 1

Views: 1756

Answers (2)

Simon O&#39;Hanlon
Simon O&#39;Hanlon

Reputation: 60000

Or use paste0 and paste. paste* is vectorised so you can do:

half <- paste(paste0("00",one), paste0("00",two) , sep = "-");head(half)
#[1] "00125-0048593" "00125-0048593" "00125-0048593" "00125-0048593"
#[5] "00125-0048593" "00125-0048593"

But you have different string widths. An alternative (sprintf did not give the same results on my system) would be to paste with more zeros than you know you will need and then trim to the desired length:

one <-  paste0("0000000000000000",one)
two <-  paste0("0000000000000000",two)
fst <- sapply( one , function(x) substring( x , first = nchar(x)-4 , last = nchar(x) ) )
snd <- sapply( two , function(x) substring( x , first = nchar(x)-6 , last = nchar(x) ) )
half <- paste( fst , snd , sep = "-");head(half)

But I agree this is not a particularly good way of doing things. I'd use sprintf if I could get that output with character class data! (work with numeric class)

Upvotes: 3

joran
joran

Reputation: 173677

Maybe you are looking for sprintf:

sprintf("%05d",125)
[1] "00125"
> sprintf("%07d",125)
[1] "0000125"

And if you are padding strings instead of integers, maybe:

sprintf("%07s","125")
[1] "0000125"

Upvotes: 6

Related Questions