Reputation: 12856
Let's say that I want to find the number of words in each row of a data frame. So in the following example, I want to find that the first value in column one has 3 words, the second value has 4 words, and so on. I assume this is a task for one of the apply functions, but i'm having little luck figuring this out.
dat = data.frame(one=c("That is Cool",
"I like my bank", "He likes pizza", "What"))
Do I need to work with strsplit()
or is it better to use the apply()
function while creating
a function: apply(dat, 1, function(x)...
Upvotes: 3
Views: 128
Reputation: 50704
Another approach based on regular expressions. Idea is to remove everything except spaces and compute length of modified string (i.e. number of spaces, so +1 to get number of words):
nchar(gsub("[^ ]", "", dat$one)) + 1
# [1] 3 4 3 1
Also you could add protection to handle string with spaces at beginning or end:
nchar(gsub("[^ ]|^ *| *$", "", dat$one)) + 1
# [1] 3 4 3 1
Examples:
x <- c(" One two ", "One Two ", " One two")
nchar(gsub("[^ ]", "", x)) + 1
# [1] 4 3 3
sapply(strsplit(x, " "), length)
# [1] 3 2 3
nchar(gsub("[^ ]|^ *| *$", "", x)) + 1
# [1] 2 2 2
One more safety check: deal with repeated spaces:
x <- " One Two "
nchar(gsub("[^ ]|^ *| *$", "", gsub(" +", " ", x))) + 1
# [1] 2
Upvotes: 2
Reputation: 176668
The code below should do it, assuming all the words are separated by spaces.
sapply(strsplit(as.character(dat$one), " "), length)
# [1] 3 4 3 1
Upvotes: 6