Adam_G
Adam_G

Reputation: 7879

Passing a single column in dplyr mutate

I am trying to use stringr with dplyr to extract characters surrounding vowels. When I try the code below, the str_match function throws the error:

Error in mutate_impl(.data, dots) : 
  Column `near_vowel` must be length 150 (the number of rows) or one, not 450

The minimum example code:

library(tidyverse)
library(magrittr)
library(stringr)
iris %>%
  select(Species) %>%
  mutate(name_length = str_length(Species),
         near_vowel = str_match(Species, "(.)[aeiou](.)"))

I would expect with, e.g. "virginica", it would extract "vir", "gin", "nic".

Upvotes: 0

Views: 1760

Answers (1)

Kevin Arseneau
Kevin Arseneau

Reputation: 6264

There are a couple of things going on that you need to address, however, I'll present a tidy approach given what you've provided in your question.

The primary issue is that you are returning multiple values per row for near_vowel, we can fix that by nesting the results. Second, you require rowwise processing for your mutate to be sensible... and thirdly (as noted by @Psidom) your regex will not produce your desired output. Addressing the first two, being the core of your question...

library(dplyr)
library(stringr)

df <- iris %>%
  select(Species) %>%
  mutate(
    name_length = str_length(Species),
    near_vowel = str_extract_all(Species, "[^aeiou][aeiou][^aeiou]")
  )

head(df)

#   Species name_length near_vowel
# 1  setosa           6        set
# 2  setosa           6        set
# 3  setosa           6        set
# 4  setosa           6        set
# 5  setosa           6        set
# 6  setosa           6        set

head(df[df$Species == "virginica", ]$near_vowel)

# [[1]]
# [1] "vir" "gin"
# 
# [[2]]
# [1] "vir" "gin"
# 
# [[3]]
# [1] "vir" "gin"
# 
# [[4]]
# [1] "vir" "gin"
# 
# [[5]]
# [1] "vir" "gin"
# 
# [[6]]
# [1] "vir" "gin"

Edit: Updated with str_extract_all approach offered by @neilfws, this has the added benefit of being able to drop the rowwise operation.

Upvotes: 1

Related Questions