Glomek
Glomek

Reputation: 71

word splitting speed up n R

I've written a function that splits words into single letters, and than create a 2 variable dataframe with those letters and their position in the original word expressed in percentage. It looks like this:

pozycje.literek <- function(slowo){
  literki <- unlist(strsplit(slowo,""))
  liczby <- seq(0,length(literki)-1) / (length(literki)-1)
  pozycje <- data_frame(literki, liczby)
  return(pozycje) 
} 

The function does what I need, but it is awfully slow. with the below example with 10 thousand elements it took 52 seconds (just the second loop, without generating random example vector of characters). And the vectors I'm dealing with are above 500 thousand.

wektor <- vector()
for(i in 1:10000){
wektor[i] <- paste0(sample(letters[1:24], round(runif(1,3,10),0)),collapse = "")
}

tabelka <- data.frame() 
system.time(for(i in wektor){
  tabelka <- rbind(tabelka, pozycje.literek(i)) #tu powstaje baza dla danego kraju i potem już jest kod wspolny bo zamieniam na 'tabelka'
})

Any idea how to speed it up? I could't think of any application of apply family, to do that, but I believe there might be one. Or the job my function does could be done in completely different way?

Upvotes: 0

Views: 59

Answers (1)

minem
minem

Reputation: 3650

literki <- strsplit(wektor, "")
x <- lengths(literki)
liczby <- lapply(x, function(x) seq(0, x-1)/(x-1))
pozycje <- data_frame(unlist(literki), unlist(liczby))

Upvotes: 3

Related Questions