Reputation: 77
I want to create an additional column in a dataframe, which is a substring from an existing column in the dataframe, but using different start and end points for each row.
Specifically, the column "codes" in the example below contains a single colon character ":" somewhere in the string. This location varies in each string. I want to take the two characters before and two characters after the colon, as well as the colon.
An example of what I currently have:
letters <- c("A", "B", "C")
codes <- c("lksjdfi99:99lksjdf", "nsj78:12osjsm", "a12:67opaidsf")
df <- data.frame(letters, codes)
print(df)
letters codes
1 A lksjdfi99:99lksjdf
2 B nsj78:12osjsm
3 C a12:67opaidsf
This is an example of what I would like to have:
letters codes new_col
1 A lksjdfi99:99lksjdf 99:99
2 B nsj78:12osjsm 78:12
3 C a12:67opaidsf 12:67
Any help would be appreciated.
Upvotes: 2
Views: 308
Reputation: 79204
Here a tidyverse solution:
library(tidyr)
library(readr)
library(dplyr)
df %>%
separate(codes, c("split1", "split2"), remove=FALSE) %>%
mutate(across(starts_with("split"), parse_number)) %>%
mutate(new_col= paste(split1, split2, sep=":"), .keep="unused")
output:
letters codes new_col
1 A lksjdfi99:99lksjdf 99:99
2 B nsj78:12osjsm 78:12
3 C a12:67opaidsf 12:67
Upvotes: 1
Reputation: 887841
We can use sub
in base R
df$new_col <- sub("\\D+(\\d+:\\d+)\\D+", "\\1", df$codes)
-output
> df
letters codes new_col
1 A lksjdfi99:99lksjdf 99:99
2 B nsj78:12osjsm 78:12
3 C a12:67opaidsf 12:67
Upvotes: 2
Reputation: 79338
You could also do:
library(tidyverse)
df <- df %>%
extract(codes, 'new_col', '(\\d+:\\d+)', remove = FALSE)
letters codes new_col
1 A lksjdfi99:99lksjdf 99:99
2 B nsj78:12osjsm 78:12
3 C a12:67opaidsf 12:67
Upvotes: 1
Reputation: 21442
You can use str_extract
:
library(stringr)
df$new_col <- str_extract(df$codes, "\\d+:\\d+")
df
letters codes new_col
1 A lksjdfi99:99lksjdf 99:99
2 B nsj78:12osjsm 78:12
3 C a12:67opaidsf 12:67
Alternatively you can use:
str_replace(df$codes,".*(\\d{2}:\\d{2}).*", "\\1")
or, in base R
:
gsub(".*(\\d{2}:\\d{2}).*", "\\1", df$codes)
Upvotes: 1