Kriti
Kriti

Reputation: 45

Extract data from a column based on another column in df

I have the following df without the choice_text column. I want to add the new column 'choice_text' which extracts the value in column 3 before ":" based on the number in column 2. So if choice is 1 then choice_text shows 15 mins.

ID Choice Options Choice_text
X 1 15 mins: 1, 30 mins: 2, 45 mins: 3
Y 2 15 mins: 1, 30 mins: 2, 45 mins: 3
Z 3 15 mins: 1, 30 mins: 2, 45 mins: 3

Upvotes: 1

Views: 122

Answers (2)

akrun
akrun

Reputation: 887981

We can use str_extract_all which is vectorized to extract the digits (\\d+) followed by the 'mins' and : along with the 'Choice' column values by pasteing the pattern with the 'Choice' column, then remove the suffix part with str_remove

library(dplyr)
library(stringr)
df1 %>%
   mutate(Choice_text = str_remove(str_extract(Options,
        str_c("\\d+ mins: ",  Choice)), ":\\s+\\d+$"))

-output

#  ID Choice                            Options Choice_text
#1  X      1 15 mins: 1, 30 mins: 2, 45 mins: 3     15 mins
#2  Y      2 15 mins: 1, 30 mins: 2, 45 mins: 3     30 mins
#3  Z      3 15 mins: 1, 30 mins: 2, 45 mins: 3     45 mins

data

df1 <- structure(list(ID = c("X", "Y", "Z"), Choice = 1:3, 
  Options = c("15 mins: 1, 30 mins: 2, 45 mins: 3", 
"15 mins: 1, 30 mins: 2, 45 mins: 3", "15 mins: 1, 30 mins: 2, 45 mins: 3"
)), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 1

r2evans
r2evans

Reputation: 161110

Base R:

trimws(gsub(":.*", "",
  mapply(`[`, strsplit(df$Options, ","), df$Choice)
))
# [1] "15 mins" "30 mins" "45 mins"

(This has been fixed so that if Choice is ever larger than the number of Options, it will return NA instead of failing.)

Walk-through:

  • strsplit splits the strings by ,, resulting in a nested list (3-long, 1 for each row) each with 3 substrings (in this case, this 3 is different than nrow(df) 3);

  • the mapply(.) portion does a "zipping" of the options; for instance, the mapply line is equivalent to:

    o <- strsplit(df$Options, ",")
    c(
      o[[1]][ df$Choice[1] ],
      o[[2]][ df$Choice[2] ],
      o[[3]][ df$Choice[3] ]
    #       ^^ this `[` is what the first argument to `mapply` is referring
    )
    
  • gsub removes the : and beyond;

  • trimws cleans up surround blank-space

Upvotes: 2

Related Questions