tnabdb
tnabdb

Reputation: 547

Explain the behavior of ```str_match_all``` in R package ```stringr```

st = list("amber johnson", "anhar link ari")
t = stringr::str_match_all(st, "(\\ba[a-z]+\\b)")
str(t)
# List of 2
#  $ : chr [1, 1:2] "amber" "amber"
#  $ : chr [1:2, 1:2] "anhar" "ari" "anhar" "ari"

Why are the results repeated like so?

Upvotes: 1

Views: 2018

Answers (1)

akuiper
akuiper

Reputation: 214957

If you look at ?str_match_all value, it says:

For str_match, a character matrix. First column is the complete match, followed by one column for each capture group. For str_match_all, a list of character matrices.

Since you pattern contains a capture group, the result contains two columns, one for the complete match one for the capture group. If you don't want the repeated column, you can remove the group parentheses from the pattern:

st = list("amber johnson", "anhar link ari")
t = str_match_all(st, "\\ba[a-z]+\\b")
str(t)

Which gives:

# List of 2
#  $ : chr [1, 1] "amber"
#  $ : chr [1:2, 1] "anhar" "ari"

Upvotes: 4

Related Questions