David Borges
David Borges

Reputation: 105

Faster paste in data frame

I'm trying to fix a column in a data frame, but it's taking too long. I want to find entries which are equal to 4 characters and paste a zero in the beginning. The data frame has 2608475 rows.

I've written this code in R:

i <- NULL
for (i in 1:length(cest07$CNAE.2.0.Classe)) {
  if (nchar(cest07$CNAE.2.0.Classe[i])==4) {
    cest07$CNAE.2.0.Classe[i] <- paste("0", cest07$CNAE.2.0.Classe[i], sep="")
  }
} 

Could someone help?

Upvotes: 0

Views: 1252

Answers (1)

rcs
rcs

Reputation: 68819

Here is a vectorized version:

### create example data set
set.seed(1)
str_len <- rpois(25, 1.2) + 1
tmp <- sapply(str_len, function(x) paste(LETTERS[seq_len(x)], collapse=""))

tmp
#  [1] "A"     "AB"    "AB"    "ABCD"  "A"     "ABCD"  "ABCD"  "AB"    "AB"
# [10] "A"     "A"     "A"     "ABC"   "AB"    "ABC"   "AB"    "ABC"   "ABCDE"
# [19] "AB"    "ABC"   "ABCD"  "A"     "AB"    "A"     "A"

### prepend '0'
ind <- (nchar(tmp) == 4)
tmp[ind] <- paste0("0", tmp[ind])

tmp
#  [1] "A"     "AB"    "AB"    "0ABCD" "A"     "0ABCD" "0ABCD" "AB"    "AB"
# [10] "A"     "A"     "A"     "ABC"   "AB"    "ABC"   "AB"    "ABC"   "ABCDE"
# [19] "AB"    "ABC"   "0ABCD" "A"     "AB"    "A"     "A"

Upvotes: 3

Related Questions