Reputation: 121137
I want to generate some strings of numbers with lots of digits, in this case for ID values in a synthetic dataset.
For short strings of numbers, I'd use sample
:
sprintf("%05.f", sample(0:(1e5-1), 18))
## [1] "54783" "80354" "53607" "99668" "63621" "07121" "15944" "27436" "96837"
## [10] "28751" "95315" "63326" "00981" "15300" "18448" "09885" "63360" "04539"
This doesn't work for longer strings. First the memory requirements get too larger, then you can't make numbers big enough. For example, this doesn't work:
sprintf("%020.f", sample(0:(1e20-1), 18))
## Error in 0:(1e+20 - 1) : result would be too long a vector
How do I make strings of numbers with lots of digits?
Upvotes: 3
Views: 123
Reputation: 24510
You can use the stringi
package:
require(stringi)
stri_rand_strings(10,50,pattern="[0-9]")
#[1] "33163217620361477538822791082750025522246331345665"
#[2] "85105858270154002408385176647161448078668054193081"
#[3] "62417899981033664011261714060242781925235001978704"
#[4] "17731152361720663463691231461493607438220463345863"
#[5] "06316044683426574113640145569673845269595104465896"
#[6] "17058300286927387520323781399768150137786864069558"
#[7] "86204984977415277470013113957915963393339586096213"
#[8] "56382530391794208466245591896055134584746907393458"
#[9] "61740570216902905237145952608961548203505061535222"
#[10] "28713530448562268345804947527043822080897315821103"
The first argument is the length of the resulting vector, the second is the number of characters of each string and with the third we say that we need just numbers.
Sticking with base
R, one could try to generate 1000 strings with 50 numbers each:
apply(matrix(sample(charToRaw("0123456789"),50*1000,replace=TRUE),nrow=1000),1,rawToChar)
Upvotes: 7
Reputation: 21532
And the obligatory competition:
GNS <- function(nNumbers, nCharsPerNumber)
{
sample(0:9, nNumbers * nCharsPerNumber, replace = TRUE) %>%
split(gl(nNumbers, nCharsPerNumber)) %>%
vapply(paste0, character(1), collapse = "", USE.NAMES = FALSE)
}
GNP <- function(nNumbers,nCharsPerNumber){
replicate(nNumbers,paste0(sample(0:9,nCharsPerNumber,replace=TRUE),collapse=""))
}
GST <- function(nNumbers,nCharsPerNumber){
stri_rand_strings(nNumbers,nCharsPerNumber,pattern="[0-9]")
}
microbenchmark(GNS(1000,100),GNP(1000,100),GST(1000,100),10)
And the scores....
Unit: milliseconds
expr min lq mean median uq max
GNS(1000, 100) 36.832684 38.918858 40.90260 40.750332 41.374165 46.369622
GNP(1000, 100) 36.808395 39.310571 39.99557 40.094511 40.772055 44.025157
GST(1000, 100) 1.882961 1.923672 2.03537 1.983199 2.166911 2.325648
neval
10
10
10
We have a clear winner!
EDIT: adding another base option, and it's even faster.
GSAP<- function(nNumbers,nCharsPerNumber){
apply( matrix(sample(charToRaw("0123456789"),nNumbers*nCharsPerNumber,replace=TRUE),nrow=nCharsPerNumber),1, rawToChar ) }
Unit: microseconds
expr min lq mean median uq max
GSAP(1000, 100) 724.584 739.637 821.435 766.8345 899.06 1030.086
GNS(1000, 100) 36189.180 38316.406 39739.471 39141.5695 39965.02 44478.450
GNP(1000, 100) 35777.282 36331.839 38448.665 38575.8945 39725.21 43016.281
GST(1000, 100) 1863.803 1898.013 1944.472 1918.7110 1975.33 2122.094
EDIT number two: try bigger inputs..and get the code right this time
(time in seconds)
expr min lq mean median uq max neval
GSAP(x, y) 3.906626 3.975160 4.069103 4.049784 4.163262 4.329284 10
GNS(x, y) 33.645200 33.972587 34.513555 34.406009 35.141313 35.328662 10
GNP(x, y) 30.833180 31.136971 33.037422 32.193070 33.010896 41.713811 10
GST(x, y) 1.697303 1.706599 1.731205 1.735127 1.756961 1.763861 10
So GST wins by a small margin.
Upvotes: 3
Reputation: 121137
Generate individual digits, parcel them out between individual numbers, then collapse the digits together.
library(magrittr)
generateNumberStrings <- function(nNumbers, nCharsPerNumber)
{
sample(0:9, nNumbers * nCharsPerNumber, replace = TRUE) %>%
split(gl(nNumbers, nCharsPerNumber)) %>%
vapply(paste0, character(1), collapse = "", USE.NAMES = FALSE)
}
generateNumberStrings(18, 20)
## [1] "06985095513359117867" "95278964413245221928" "75398392571928201881"
## [4] "00722065797044523279" "24475619649735183646" "29165493966488037145"
## [7] "34289922968745727406" "82354362380114534171" "84293845597888728670"
## [10] "97570546918892201649" "41421884356741221760" "99306177663904189401"
## [13] "25668966612346726451" "94949806854834288664" "43664073601604613019"
## [16] "25848242347176214032" "80736828777283687373" "83763855757083999312"
Upvotes: 2
Reputation: 23818
A base R alternative:
set.seed(123)
paste0(sample(0:9,50,replace=TRUE),collapse="")
#[1] "27489058549465182039866967552199670472321443112428"
EDIT: As suggested by @docendodiscimus this can be combined with replicate()
to obtain an arbitrary number of such strings:
replicate(10,paste0(sample(0:9,50,replace=TRUE),collapse=""))
# [1] "27489058549465182039866967552199670472321443112428" "04715217836032848874767042363126471498811636317045"
# [3] "53494896419309715954633239101668675687943401822027" "84321352425363357242618766358583725425992396944615"
# [5] "29654832114226073489297603456964502318185616373997" "22525714489869553305800177940671320302062108789107"
# [7] "70776410443470388238821710903962783466694152439326" "19516964381183371044438459723957375912029277122119"
# [9] "91953470363824219340565386331895392614012571877136" "53202887119441522628084764602728369116489047092067"
Upvotes: 6