Reputation: 2297
I have a data that looks like this: and I only want to get the number that start with H
out?
The sample data can be build using codes:
df2<-structure(list(N1 = c("H#7", "H#7 W#8", "H#7,H#8", "H#07", "#H/7",
"#/\\W7", "W#16 A/H# 2 A/H #4", "H7 and H8")), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
Upvotes: 1
Views: 47
Reputation: 1250
Try:
library(dplyr)
library(stringr)
library(purrr)
df2 <- df2 %>% mutate(Output = str_extract_all(N1,"H[^H]*\\d+") %>%
map_chr(.,~paste(str_extract(.,pattern="\\d+"),collapse=", ")) )
df2
Output:
# A tibble: 8 x 2
N1 Output
<chr> <chr>
1 "H#7" "7"
2 "H#7 W#8" "7"
3 "H#7,H#8" "7, 8"
4 "H#07" "07"
5 "#H/7" "7"
6 "#/\\W7" ""
7 "W#16 A/H# 2 A/H #4" "2, 4"
8 "H7 and H8" "7, 8"
Upvotes: 2
Reputation: 887213
An option is to extract only the characters that start with 'H' followed by # or /
if any and one or more digits (\\d+
), then remove the characters other than the digits with in the list
with str_remove_all
, create a condition with if/else
to paste the #
as prefix after converting to numeric
library(dplyr)
library(stringr)
library(purrr)
df2 %>%
mutate(Outcome = map_chr(str_extract_all(N1, "H[#/ ]*?\\d+"),
~ {
tmp <- as.numeric(str_remove_all(.x, "[H#/]"))
if(length(tmp) > 0) str_c("#", tmp, collapse=", ") else ""
}))
-output
# A tibble: 8 x 2
# N1 Outcome
# <chr> <chr>
#1 "H#7" "#7"
#2 "H#7 W#8" "#7"
#3 "H#7,H#8" "#7, #8"
#4 "H#07" "#7"
#5 "#H/7" "#7"
#6 "#/\\W7" ""
#7 "W#16 A/H# 2 A/H #4" "#2, #4"
#8 "H7 and H8" "#7, #8"
Or use a regex lookaround to make this more compact
df2 %>%
mutate(tmp = map_chr(str_extract_all(N1,
'(?<=H[#/]?)\\d+|(?<=H# )\\d+|(?<=H #)\\d+'),
~ if(length(.x) > 0) str_c('#', as.numeric(.x), collapse=", ") else ""))
Upvotes: 3
Reputation: 24810
Here's an alternative approach with the more generic \\W
non-word character type:
library(tidyverse)
df2 %>%
mutate(Outcome = map(str_extract_all(N1,"H\\W*[0-9]+"),
~str_remove_all(.x,"\\D") %>%
as.numeric %>%
map_chr(~paste0("#",.x))) %>%
map_chr(~paste(.x,collapse = ", ")))
# A tibble: 8 x 2
N1 Outcome
<chr> <chr>
1 "H#7" "#7"
2 "H#7 W#8" "#7"
3 "H#7,H#8" "#7, #8"
4 "H#07" "#7"
5 "#H/7" "#7"
6 "#/\\W7" ""
7 "W#16 A/H# 2 A/H #4" "#2, #4"
8 "H7 and H8" "#7, #8"
Upvotes: 2