Reputation: 4243
I have a dataframe like below:
Col1 Col2
A 5!5!!6!!3!!m
B 7_8!!6!!7!!t
structure(list(Col1 = c("A", "B"), Col2 = c("5!5!!6!!3!!m", "7_8!!6!!7!!t" )), class = "data.frame", row.names = c(NA, -2L))
How do I create a new column that extracts the 3rd parse of the strings found in Col2?
In SQL I am using SPLIT_PART function:
SPLIT_PART(Col2, '!!', 3)
I am looking to find an equivalent function in R.
Expected output:
Col1 Col2 Col3
A 5!5!!6!!3!!m 3
B 7_8!!6!!7!!t 7
Upvotes: 3
Views: 47
Reputation: 887148
We can use str_extract
to extract the digits
library(stringr)
df1 %>%
mutate(Col3 = as.numeric(str_extract(Col2, "\\d+(?=!![a-z]+$)")))
# Col1 Col2 Col3
#1 A 5!5!!6!!3!!m 3
#2 B 7_8!!6!!7!!t 7
If we need it by position, then
df1$Col3 <- as.numeric(sapply(strsplit(df1$Col2, "!!", fixed = TRUE), `[`, 3))
df1$Col3
#[1] 3 7
Or using gsubfn
to create a position identifier and then extract the numbers before it
library(gsubfn)
p <- proto(fun = function(this, x) if(count == 3) paste0(";", x))
as.numeric(str_extract(gsubfn("(!!)", p, df1$Col2), "\\d+(?=;)"))
#[1] 3 7
df1 <- structure(list(Col1 = c("A", "B"), Col2 = c("5!5!!6!!3!!m", "7_8!!6!!7!!t"
)), class = "data.frame", row.names = c(NA, -2L))
Upvotes: 1
Reputation: 12155
Here's a tidyverse
option, though the core is funcitonally identical to Rushabh's data.table based answer.
When given the simplify=T
argument, stringr::str_split
will output a matrix, with each match in a column. You can subset the desired column from it to extract the desired position:
library(tidyverse)
df1 %>%
mutate(Col3 = str_split(Col2, pattern = '!!', simplify=T)[,3])
Col1 Col2 Col3
1 A 5!5!!6!!3!!m 5!5
2 B 7_8!!6!!7!!t 7_8
df1 %>%
mutate(Col3 = str_split(Col2, pattern = '!!', simplify=T)[,2])
Col1 Col2 Col3
1 A 5!5!!6!!3!!m 6
2 B 7_8!!6!!7!!t 6
df1 %>%
mutate(Col3 = str_split(Col2, pattern = '!!', simplify=T)[,1])
Col1 Col2 Col3
1 A 5!5!!6!!3!!m 5!5
2 B 7_8!!6!!7!!t 7_8
Upvotes: 1
Reputation: 2764
You can use str_split
from stringr
package-
> library(stringr)
> library(data.table)
> setDT(dt)[,Col3:=sapply(Col2,function(x) unlist(str_split(x,"!!"))[3])]
Output-
> dt
Col1 Col2 Col3
1: A 5!5!!6!!3!!m 3
2: B 7_8!!6!!7!!t 7
Note- You can change position
from 3rd
to nth
in function
.
Upvotes: 1