Reputation: 2763
Just say I want to cont the number of "a"'s and "p"'s in the word "apple", I can do:
library(stringr)
sum(str_count("apple", c("b", "k")))
but when I try to apply this logic in order to count the number of "a"'s and "p"'s across multiple different words in a variable in a dataframe, it doesn't work, e.g.:
dat <- tibble(id = 1:4, word = c(c("apple", "banana", "pear", "pineapple")))
dat <- dat %>% mutate(num_ap = sum(str_count(word, c("a", "p"))))
it doesn't work. I the variable "num_ap" should read c(3, 3, 2, 4)
but instead it reads c(5, 5, 5, 5)
Does anyone know why this isn't working for me?
Thanks!
Upvotes: 2
Views: 419
Reputation: 887088
Using base R
dat$num_ap <- nchar(gsub("[^ap]", "", dat$word))
-output
> dat
id word num_ap
1 1 apple 3
2 2 banana 3
3 3 pear 2
4 4 pineapple 4
dat <- structure(list(id = 1:4, word = c("apple", "banana", "pear",
"pineapple")), class = "data.frame", row.names = c(NA, -4L))
Upvotes: 1
Reputation: 21400
Two solutions (both without sum
):
with rowwise()
:
library(dplyr)
library(stringr)
dat %>%
rowwise() %>%
mutate(num_ap = str_count(word, "a|p"))
id word num_ap
1 1 apple 3
2 2 banana 3
3 3 pear 2
4 4 pineapple 4
with lengths
and str_extract_all
:
library(dplyr)
library(stringr)
dat %>%
mutate(num_ap = lengths(str_extract_all(word, "a|p")))
id word num_ap
1 1 apple 3
2 2 banana 3
3 3 pear 2
4 4 pineapple 4
Upvotes: 2
Reputation: 171
In cases like this it helps to backtrack the issue.
str_count(dat$word, c("a", "p"))
by itself will return [1] 1 0 1 3
. Each number represents the number of times the letter 'p' appears in each word in your data frame. If you take the sum of that vector with sum(str_count(dat$word, c("a", "p")))
, you get [1] 5
. Since you are not going row by row, each row will be assigned a value of 5, which is consistent with your results.
To fix this, note that the function rowwise()
(part of the dplyr
library) allows you to do work with each row individually. Hence, modifying your code to incorporate the rowwise()
function will solve your problem:
dat <- dat %>% rowwise() %>% mutate(num_ap = sum(str_count(word, c("a", "p"))))
Upvotes: 3
Reputation: 5254
sapply
the transformation to each element of dat$word
library(stringr)
dat <- data.frame(id = 1:4, word = c(c("apple", "banana", "pear", "pineapple")))
dat$num_ap <- sapply(dat$word, function(x) sum(str_count(x, c("a", "p"))))
dat
#> id word num_ap
#> 1 1 apple 3
#> 2 2 banana 3
#> 3 3 pear 2
#> 4 4 pineapple 4
Created on 2021-10-14 by the reprex package (v2.0.1)
Upvotes: 2