Reputation: 369
This link is 90% of the way to solving what I want do figure out: R Split String By Delimiter in a column
Here's the example input:
A B C
awer.ttp.net Code 554
abcd.ttp.net Code 747
asdf.ttp.net Part 554
xyz.ttp.net Part 747
And the desired result:
library(dplyr)
df = df %>% mutate(D=gsub("\\..*","",A))
A B C D
awer.ttp.net Code 554 awer
abcd.ttp.net Code 747 abcd
asdf.ttp.net Part 554 asdf
xyz.ttp.net Part 747 xyz
But this only gives you the string before the first dot. What if you want the following?
A B C D
awer.ttp.net Code 554 ttp
abcd.ttp.net Code 747 ttp
asdf.ttp.net Part 554 ttp
xyz.ttp.net Part 747 ttp
Upvotes: 1
Views: 1386
Reputation: 488
You can use the strsplit
function for this, and wrap it in a function that returns the part you want.
Make your dataframe
temp <- "A B C
awer.ttp.net Code 554
abcd.ttp.net Code 747
asdf.ttp.net Part 554
xyz.ttp.net Part 747
"
df <- read.table(textConnection(temp), header=TRUE, as.is=TRUE )
We want use the strsplit
function, which splits a string at a given pattern, and returns a list containing a vector with the different strings. For instance:
strsplit("A-B-C-D", "-")
#[[1]]
#[1] "A" "B" "C" "D"
Wrap this into a function that returns a specified part
mystrsplit <- function(x, pattern, part=2){
return(strsplit(x, pattern)[[1]][part])
}
# Vectorize it so that it can handle vector arguments of x
mystrsplit <- Vectorize(mystrsplit, vectorize.args = "x")
Use our mystrsplit
function in mutate:
library(dplyr)
df %>% mutate(D=mystrsplit(A, '\\.', 2))
# A B C D
#1 awer.ttp.net Code 554 ttp
#2 abcd.ttp.net Code 747 ttp
#3 asdf.ttp.net Part 554 ttp
#4 xyz.ttp.net Part 747 ttp
Upvotes: 0
Reputation: 887511
We can capture as a group. Match one or more characters that are not a .
([^.]+
) from the beginning (^
) of string followed by a .
followed by another set of characters that are not a dot captured as a group (([^.]+)
) followed by other character and replace with the backreference (\\1
) of the captured group
library(dplyr)
df1 %>%
mutate(D= sub("^[^.]+\\.([^.]+)\\..*", "\\1", A))
# A B C D
#1 awer.ttp.net Code 554 ttp
#2 abcd.ttp.net Code 747 ttp
#3 asdf.ttp.net Part 554 ttp
#4 xyz.ttp.net Part 747 ttp
Or using extract
library(tidyr)
df1 %>%
extract(A, into = 'D', "^[^.]+\\.([^.]+).*", remove = FALSE)
Note that we don't need the dplyr
for this
df1$D <- sub("^[^.]+\\.([^.]+)\\..*", "\\1", df1$A)
Upvotes: 1