Stataq
Stataq

Reputation: 2297

how to keep a number with a specials letter ahead

I have a data that looks like this: and I only want to get the number that start with H out? enter image description here

The sample data can be build using codes:

df2<-structure(list(N1 = c("H#7", "H#7 W#8", "H#7,H#8", "H#07", "#H/7", 
"#/\\W7", "W#16 A/H# 2 A/H #4", "H7 and H8")), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))

Upvotes: 1

Views: 47

Answers (3)

Marcos P&#233;rez
Marcos P&#233;rez

Reputation: 1250

Try:

library(dplyr)
library(stringr)
library(purrr)
df2 <- df2 %>% mutate(Output = str_extract_all(N1,"H[^H]*\\d+") %>% 
                        map_chr(.,~paste(str_extract(.,pattern="\\d+"),collapse=", ")) )
df2

Output:

# A tibble: 8 x 2
  N1                   Output
  <chr>                <chr> 
1 "H#7"                "7"   
2 "H#7 W#8"            "7"   
3 "H#7,H#8"            "7, 8"
4 "H#07"               "07"  
5 "#H/7"               "7"   
6 "#/\\W7"             ""    
7 "W#16 A/H# 2 A/H #4" "2, 4"
8 "H7 and H8"          "7, 8"

Upvotes: 2

akrun
akrun

Reputation: 887213

An option is to extract only the characters that start with 'H' followed by # or / if any and one or more digits (\\d+), then remove the characters other than the digits with in the list with str_remove_all, create a condition with if/else to paste the # as prefix after converting to numeric

library(dplyr)
library(stringr)
library(purrr)
df2 %>%
   mutate(Outcome = map_chr(str_extract_all(N1, "H[#/ ]*?\\d+"),
     ~ {
     tmp <- as.numeric(str_remove_all(.x, "[H#/]"))
       if(length(tmp) > 0) str_c("#", tmp, collapse=", ") else ""
   }))

-output

# A tibble: 8 x 2
#  N1                   Outcome 
#  <chr>                <chr>   
#1 "H#7"                "#7"    
#2 "H#7 W#8"            "#7"    
#3 "H#7,H#8"            "#7, #8"
#4 "H#07"               "#7"    
#5 "#H/7"               "#7"    
#6 "#/\\W7"             ""      
#7 "W#16 A/H# 2 A/H #4" "#2, #4"
#8 "H7 and H8"          "#7, #8"

Or use a regex lookaround to make this more compact

df2 %>%
 mutate(tmp = map_chr(str_extract_all(N1,
       '(?<=H[#/]?)\\d+|(?<=H# )\\d+|(?<=H #)\\d+'), 
   ~ if(length(.x) > 0)  str_c('#', as.numeric(.x), collapse=", ") else ""))

Upvotes: 3

Ian Campbell
Ian Campbell

Reputation: 24810

Here's an alternative approach with the more generic \\W non-word character type:

library(tidyverse)
df2 %>% 
  mutate(Outcome = map(str_extract_all(N1,"H\\W*[0-9]+"),
                       ~str_remove_all(.x,"\\D") %>% 
                         as.numeric %>%
                         map_chr(~paste0("#",.x))) %>% 
                      map_chr(~paste(.x,collapse = ", ")))
# A tibble: 8 x 2
  N1                   Outcome 
  <chr>                <chr>   
1 "H#7"                "#7"    
2 "H#7 W#8"            "#7"    
3 "H#7,H#8"            "#7, #8"
4 "H#07"               "#7"    
5 "#H/7"               "#7"    
6 "#/\\W7"             ""      
7 "W#16 A/H# 2 A/H #4" "#2, #4"
8 "H7 and H8"          "#7, #8"

Upvotes: 2

Related Questions