ATMathew
ATMathew

Reputation: 12856

Finding the number of words in each row

Let's say that I want to find the number of words in each row of a data frame. So in the following example, I want to find that the first value in column one has 3 words, the second value has 4 words, and so on. I assume this is a task for one of the apply functions, but i'm having little luck figuring this out.

dat = data.frame(one=c("That is Cool",
  "I like my bank", "He likes pizza", "What"))

Do I need to work with strsplit() or is it better to use the apply() function while creating a function: apply(dat, 1, function(x)...

Upvotes: 3

Views: 128

Answers (2)

Marek
Marek

Reputation: 50704

Another approach based on regular expressions. Idea is to remove everything except spaces and compute length of modified string (i.e. number of spaces, so +1 to get number of words):

nchar(gsub("[^ ]", "", dat$one)) + 1
# [1] 3 4 3 1

Also you could add protection to handle string with spaces at beginning or end:

nchar(gsub("[^ ]|^ *| *$", "", dat$one)) + 1
# [1] 3 4 3 1

Examples:

x <- c(" One two ", "One Two ", " One two")
nchar(gsub("[^ ]", "", x)) + 1
# [1] 4 3 3
sapply(strsplit(x, " "), length)
# [1] 3 2 3
nchar(gsub("[^ ]|^ *| *$", "", x)) + 1
# [1] 2 2 2

One more safety check: deal with repeated spaces:

x <- " One    Two    "
nchar(gsub("[^ ]|^ *| *$", "", gsub(" +", " ", x))) + 1 
# [1] 2

Upvotes: 2

Joshua Ulrich
Joshua Ulrich

Reputation: 176668

The code below should do it, assuming all the words are separated by spaces.

sapply(strsplit(as.character(dat$one), " "), length)
# [1] 3 4 3 1

Upvotes: 6

Related Questions