Reputation: 7879
I am trying to use stringr
with dplyr
to extract characters surrounding vowels. When I try the code below, the str_match
function throws the error:
Error in mutate_impl(.data, dots) :
Column `near_vowel` must be length 150 (the number of rows) or one, not 450
The minimum example code:
library(tidyverse)
library(magrittr)
library(stringr)
iris %>%
select(Species) %>%
mutate(name_length = str_length(Species),
near_vowel = str_match(Species, "(.)[aeiou](.)"))
I would expect with, e.g. "virginica", it would extract "vir", "gin", "nic".
Upvotes: 0
Views: 1760
Reputation: 6264
There are a couple of things going on that you need to address, however, I'll present a tidy approach given what you've provided in your question.
The primary issue is that you are returning multiple values per row for near_vowel
, we can fix that by nesting the results. Second, you require rowwise
processing for your mutate to be sensible... and thirdly (as noted by @Psidom) your regex will not produce your desired output. Addressing the first two, being the core of your question...
library(dplyr)
library(stringr)
df <- iris %>%
select(Species) %>%
mutate(
name_length = str_length(Species),
near_vowel = str_extract_all(Species, "[^aeiou][aeiou][^aeiou]")
)
head(df)
# Species name_length near_vowel
# 1 setosa 6 set
# 2 setosa 6 set
# 3 setosa 6 set
# 4 setosa 6 set
# 5 setosa 6 set
# 6 setosa 6 set
head(df[df$Species == "virginica", ]$near_vowel)
# [[1]]
# [1] "vir" "gin"
#
# [[2]]
# [1] "vir" "gin"
#
# [[3]]
# [1] "vir" "gin"
#
# [[4]]
# [1] "vir" "gin"
#
# [[5]]
# [1] "vir" "gin"
#
# [[6]]
# [1] "vir" "gin"
Edit: Updated with
str_extract_all
approach offered by @neilfws, this has the added benefit of being able to drop therowwise
operation.
Upvotes: 1