jimiclapton
jimiclapton

Reputation: 889

Substring evaluation of word components in R for NLP

I'm trying to do some string evaluations on given words such that the output is a list of the components of the word in 2 letter combinations.

Eg

'House' becomes 'ho','ou','us','se'

Producing this outcome is relatively easy using 'substr' as below:

y= 'house'

substr(y, start = 1, stop = 2)
substr(y, start = 2, stop = 3)
substr(y, start = 3, stop = 4)
substr(y, start = 4, stop = 5)

What I would like to be able to do however, is do this almost recursively so that any word of any length will be outputted to its component 2 letter combinations.

So 'Motorcar' become 'mo','ot','to','or','rc','ca','ar'. Etc Etc.

Is there a way this can perhaps be done using loops or a function? Does the lenght of the word need to be a condition of the function?

Any thoughts greatly appreciated.

Upvotes: 1

Views: 40

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389135

We can use substring :

get_string <- function(x) {
   inds <- seq_len(nchar(x))
   start = inds[-length(inds)]
   stop = inds[-1]
   substring(x, start, stop)
}

get_string('House')
#[1] "Ho" "ou" "us" "se"

get_string('Motorcar')
#[1] "Mo" "ot" "to" "or" "rc" "ca" "ar"

Upvotes: 1

Related Questions