R - Splitting strings in a column on a character and keeping specific results

Question

This link is 90% of the way to solving what I want do figure out: R Split String By Delimiter in a column

Here's the example input:

A               B       C    
awer.ttp.net    Code    554
abcd.ttp.net    Code    747
asdf.ttp.net    Part    554
xyz.ttp.net     Part    747

And the desired result:

library(dplyr)
df = df %>% mutate(D=gsub("\..*","",A))

A    B   C    D
awer.ttp.net Code 554 awer
abcd.ttp.net Code 747 abcd
asdf.ttp.net Part 554 asdf
xyz.ttp.net Part 747  xyz

But this only gives you the string before the first dot. What if you want the following?

A    B   C    D
awer.ttp.net Code 554 ttp
abcd.ttp.net Code 747 ttp
asdf.ttp.net Part 554 ttp
xyz.ttp.net Part 747  ttp

akrun · Accepted Answer

We can capture as a group. Match one or more characters that are not a . ([^.]+) from the beginning (^) of string followed by a . followed by another set of characters that are not a dot captured as a group (([^.]+)) followed by other character and replace with the backreference (\1) of the captured group

library(dplyr)
df1 %>%
    mutate(D= sub("^[^.]+\.([^.]+)\..*", "\1", A))
#             A    B   C   D
#1 awer.ttp.net Code 554 ttp
#2 abcd.ttp.net Code 747 ttp
#3 asdf.ttp.net Part 554 ttp
#4  xyz.ttp.net Part 747 ttp

Or using extract

library(tidyr)
df1 %>% 
   extract(A, into = 'D', "^[^.]+\.([^.]+).*", remove = FALSE)

Note that we don't need the dplyr for this

df1$D <- sub("^[^.]+\.([^.]+)\..*", "\1", df1$A)

R - Splitting strings in a column on a character and keeping specific results

Answers (2)

Related Questions