Richie Cotton
Richie Cotton

Reputation: 121137

How do you generate long strings of numbers?

I want to generate some strings of numbers with lots of digits, in this case for ID values in a synthetic dataset.

For short strings of numbers, I'd use sample:

sprintf("%05.f", sample(0:(1e5-1), 18))
##  [1] "54783" "80354" "53607" "99668" "63621" "07121" "15944" "27436" "96837"
## [10] "28751" "95315" "63326" "00981" "15300" "18448" "09885" "63360" "04539"

This doesn't work for longer strings. First the memory requirements get too larger, then you can't make numbers big enough. For example, this doesn't work:

sprintf("%020.f", sample(0:(1e20-1), 18))
## Error in 0:(1e+20 - 1) : result would be too long a vector

How do I make strings of numbers with lots of digits?

Upvotes: 3

Views: 123

Answers (4)

nicola
nicola

Reputation: 24510

You can use the stringi package:

 require(stringi)
 stri_rand_strings(10,50,pattern="[0-9]")
 #[1] "33163217620361477538822791082750025522246331345665"
 #[2] "85105858270154002408385176647161448078668054193081"
 #[3] "62417899981033664011261714060242781925235001978704"
 #[4] "17731152361720663463691231461493607438220463345863"
 #[5] "06316044683426574113640145569673845269595104465896"
 #[6] "17058300286927387520323781399768150137786864069558"
 #[7] "86204984977415277470013113957915963393339586096213"
 #[8] "56382530391794208466245591896055134584746907393458"
 #[9] "61740570216902905237145952608961548203505061535222"
 #[10] "28713530448562268345804947527043822080897315821103"

The first argument is the length of the resulting vector, the second is the number of characters of each string and with the third we say that we need just numbers.

Sticking with base R, one could try to generate 1000 strings with 50 numbers each:

apply(matrix(sample(charToRaw("0123456789"),50*1000,replace=TRUE),nrow=1000),1,‌​rawToChar)

Upvotes: 7

Carl Witthoft
Carl Witthoft

Reputation: 21532

And the obligatory competition:

GNS <- function(nNumbers, nCharsPerNumber)
{
  sample(0:9, nNumbers * nCharsPerNumber, replace = TRUE) %>%
    split(gl(nNumbers, nCharsPerNumber)) %>% 
    vapply(paste0, character(1), collapse = "", USE.NAMES = FALSE)
}


GNP <- function(nNumbers,nCharsPerNumber){

replicate(nNumbers,paste0(sample(0:9,nCharsPerNumber,replace=TRUE),collapse=""))
}

GST <- function(nNumbers,nCharsPerNumber){
stri_rand_strings(nNumbers,nCharsPerNumber,pattern="[0-9]")
}


microbenchmark(GNS(1000,100),GNP(1000,100),GST(1000,100),10)

And the scores....

Unit: milliseconds
           expr       min        lq     mean    median        uq       max
 GNS(1000, 100) 36.832684 38.918858 40.90260 40.750332 41.374165 46.369622
 GNP(1000, 100) 36.808395 39.310571 39.99557 40.094511 40.772055 44.025157
 GST(1000, 100)  1.882961  1.923672  2.03537  1.983199  2.166911  2.325648
 neval
    10
    10
    10

We have a clear winner!

EDIT: adding another base option, and it's even faster.

GSAP<- function(nNumbers,nCharsPerNumber){
apply( matrix(sample(charToRaw("0123456789"),nNumbers*nCharsPerNumber,replace=TRUE),nrow=nCharsPerNumber),1, rawToChar )  }
Unit: microseconds
            expr       min        lq      mean     median       uq       max
 GSAP(1000, 100)   724.584   739.637   821.435   766.8345   899.06  1030.086
  GNS(1000, 100) 36189.180 38316.406 39739.471 39141.5695 39965.02 44478.450
  GNP(1000, 100) 35777.282 36331.839 38448.665 38575.8945 39725.21 43016.281
  GST(1000, 100)  1863.803  1898.013  1944.472  1918.7110  1975.33  2122.094

EDIT number two: try bigger inputs..and get the code right this time

(time in seconds)

     expr       min        lq      mean    median        uq       max neval
 GSAP(x, y)  3.906626  3.975160  4.069103  4.049784  4.163262  4.329284    10
  GNS(x, y) 33.645200 33.972587 34.513555 34.406009 35.141313 35.328662    10
  GNP(x, y) 30.833180 31.136971 33.037422 32.193070 33.010896 41.713811    10
  GST(x, y)  1.697303  1.706599  1.731205  1.735127  1.756961  1.763861    10

So GST wins by a small margin.

Upvotes: 3

Richie Cotton
Richie Cotton

Reputation: 121137

Generate individual digits, parcel them out between individual numbers, then collapse the digits together.

library(magrittr)
generateNumberStrings <- function(nNumbers, nCharsPerNumber)
{
  sample(0:9, nNumbers * nCharsPerNumber, replace = TRUE) %>%
    split(gl(nNumbers, nCharsPerNumber)) %>% 
    vapply(paste0, character(1), collapse = "", USE.NAMES = FALSE)
}

generateNumberStrings(18, 20)
##  [1] "06985095513359117867" "95278964413245221928" "75398392571928201881"
##  [4] "00722065797044523279" "24475619649735183646" "29165493966488037145"
##  [7] "34289922968745727406" "82354362380114534171" "84293845597888728670"
## [10] "97570546918892201649" "41421884356741221760" "99306177663904189401"
## [13] "25668966612346726451" "94949806854834288664" "43664073601604613019"
## [16] "25848242347176214032" "80736828777283687373" "83763855757083999312"

Upvotes: 2

RHertel
RHertel

Reputation: 23818

A base R alternative:

set.seed(123)
paste0(sample(0:9,50,replace=TRUE),collapse="")
#[1] "27489058549465182039866967552199670472321443112428"

EDIT: As suggested by @docendodiscimus this can be combined with replicate() to obtain an arbitrary number of such strings:

replicate(10,paste0(sample(0:9,50,replace=TRUE),collapse=""))
# [1] "27489058549465182039866967552199670472321443112428" "04715217836032848874767042363126471498811636317045"
# [3] "53494896419309715954633239101668675687943401822027" "84321352425363357242618766358583725425992396944615"
# [5] "29654832114226073489297603456964502318185616373997" "22525714489869553305800177940671320302062108789107"
# [7] "70776410443470388238821710903962783466694152439326" "19516964381183371044438459723957375912029277122119"
# [9] "91953470363824219340565386331895392614012571877136" "53202887119441522628084764602728369116489047092067"

Upvotes: 6

Related Questions