lf208
lf208

Reputation: 77

How to substring column in R using different character locations for each row

I want to create an additional column in a dataframe, which is a substring from an existing column in the dataframe, but using different start and end points for each row.

Specifically, the column "codes" in the example below contains a single colon character ":" somewhere in the string. This location varies in each string. I want to take the two characters before and two characters after the colon, as well as the colon.

An example of what I currently have:

letters <- c("A", "B", "C")

codes <- c("lksjdfi99:99lksjdf", "nsj78:12osjsm", "a12:67opaidsf")

df <- data.frame(letters, codes)

print(df)

  letters              codes
1       A lksjdfi99:99lksjdf
2       B      nsj78:12osjsm
3       C      a12:67opaidsf

This is an example of what I would like to have:

  letters              codes new_col
1       A lksjdfi99:99lksjdf   99:99
2       B      nsj78:12osjsm   78:12
3       C      a12:67opaidsf   12:67

Any help would be appreciated.

Upvotes: 2

Views: 308

Answers (4)

TarJae
TarJae

Reputation: 79204

Here a tidyverse solution:

library(tidyr)
library(readr)
library(dplyr)
df %>% 
  separate(codes, c("split1", "split2"), remove=FALSE) %>% 
  mutate(across(starts_with("split"), parse_number)) %>% 
  mutate(new_col= paste(split1, split2, sep=":"), .keep="unused")

output:

  letters              codes new_col
1       A lksjdfi99:99lksjdf   99:99
2       B      nsj78:12osjsm   78:12
3       C      a12:67opaidsf   12:67

Upvotes: 1

akrun
akrun

Reputation: 887841

We can use sub in base R

df$new_col <- sub("\\D+(\\d+:\\d+)\\D+", "\\1", df$codes)

-output

> df
  letters              codes new_col
1       A lksjdfi99:99lksjdf   99:99
2       B      nsj78:12osjsm   78:12
3       C      a12:67opaidsf   12:67

Upvotes: 2

Onyambu
Onyambu

Reputation: 79338

You could also do:

library(tidyverse)
df <- df %>% 
   extract(codes, 'new_col', '(\\d+:\\d+)', remove = FALSE)
  letters              codes new_col
1       A lksjdfi99:99lksjdf   99:99
2       B      nsj78:12osjsm   78:12
3       C      a12:67opaidsf   12:67

Upvotes: 1

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21442

You can use str_extract:

library(stringr)
df$new_col <- str_extract(df$codes, "\\d+:\\d+")
df
  letters              codes new_col
1       A lksjdfi99:99lksjdf   99:99
2       B      nsj78:12osjsm   78:12
3       C      a12:67opaidsf   12:67

Alternatively you can use:

str_replace(df$codes,".*(\\d{2}:\\d{2}).*", "\\1")

or, in base R:

gsub(".*(\\d{2}:\\d{2}).*", "\\1", df$codes)

Upvotes: 1

Related Questions