patrick
patrick

Reputation: 380

Split a string into character efficiently

I am searching for an efficient way to split word into char( I have some special characters like ",). ) . I have done something using loop and substring function but it is super slow.

Example: Code Input

words <- data.frame(V1 = c("blibli","blabla","\"","]"))
words$V1 <- as.character(words$V1)

Input looks like:

      V1
1 blibli
2 blabla
3      "
4      ]

Code that i have done:

char_df <- NULL
for(i in 1:nrow(words)){
  print(i)
  temp <- substring(words[i,][1],1:nchar(words[i,]),1:nchar(words[i,]))
  char_df <- rbind(char_df,
                   data.frame(char = temp,
                              idx = 1:nchar(words[i,]) )
  )

}

expected output:

 char idx
1     b   1
2     l   2
3     i   3
4     b   4
5     l   5
6     i   6
7     b   1
8     l   2
9     a   3
10    b   4
11    l   5
12    a   6
13    "   1
14    ]   1

I am open to any technique dplyr , data.table , base R.

Upvotes: 2

Views: 1006

Answers (2)

amonk
amonk

Reputation: 1795

Additionally, I would add the pretty nifty package stringi

library(stringi)
x<-c("dog","cat","@@$")
unlist(stri_extract_all(x,regex = "."))
[1] "d" "o" "g" "c" "a" "t" "@" "@" "$"

Upvotes: 2

akrun
akrun

Reputation: 887048

After splitting the 'V1' by '' into a list, we get the sequence of the lengths of the list and create a data.frame by unlisting the list

lst <- strsplit(words$V1, "")
data.frame(char = unlist(lst), idx = sequence(lengths(lst)))
#    char idx
#1     b   1
#2     l   2
#3     i   3
#4     b   4
#5     l   5
#6     i   6
#7     b   1
#8     l   2
#9     a   3
#10    b   4
#11    l   5
#12    a   6
#13    "   1
#14    ]   1

Upvotes: 3

Related Questions