CeC
CeC

Reputation: 85

How to split a string (in a column) using 2 different conditions into 2 separate columns and only keep those 2 columns?

I have a column of strings that's like this:

|Image
|---
|CR 00_01_01
|SF 45_04_07
|ect

I want to get an end result of this:

| Condition | Time |
| ---       | ---  |
| CR        | 00   |

I have 2 steps of doing this but it's very cumbersome. Essentially, I split the string twice first using space and second using _.

df <- df[, c("Condition","T") := tstrsplit(Image, " ", fixed=T)]
df <- df[, c("Time") := tstrsplit(T, "_", fixed=TRUE, keep = 1L)]

Is there any better way of doing this?

Upvotes: 0

Views: 952

Answers (2)

Andrew
Andrew

Reputation: 5138

Here is a strsplit solution that sounds like it is what you are looking for. Split based on space or underscore and select first two elements.

split_string <- strsplit(df1$Image, split = "\\s|_")

data.frame(Condition = sapply(split_string, `[`, 1),
           Time = sapply(split_string, `[`, 2))

  Condition Time
1        CR   00
2        SF   45

If the format of the Image column is always the same, you could extract based on position.

data.frame(Condition = substr(df1$Image, 1, 2),
           Time = substr(df1$Image, 4, 5))

  Condition Time
1        CR   00
2        SF   45

Or you could just use regex to extract the letters / first pair of numbers.

data.frame(Condition = gsub("^([[:alpha:]]+).*", "\\1", df1$Image),
           Time = gsub(".*[[:space:]]([[:digit:]]+)_.*", "\\1", df1$Image))

  Condition Time
1        CR   00
2        SF   45

Data:

df1 <- data.frame(Image = c("CR 00_01_01", "SF 45_04_07"), stringsAsFactors = F)

Upvotes: 1

boski
boski

Reputation: 2467

You can try this using dplyr and tidyr

df%>%separate(image,c("Image","Time")," ")%>%
  mutate(Time=sub("([0-9]+).*","\\1",Time))

  Image Time
1    CR   00
2    SF   45

Data

structure(list(image = c("CR 00_01_01", "SF 45_04_07")), class = "data.frame", row.names = c(NA, 
-2L))

Upvotes: 1

Related Questions