Rstudyer
Rstudyer

Reputation: 477

Subset text and fill out results in cols based on multiple conditions in R

Let us say the dataset I have is like this:

label<-c(" "," ", " ", " ", " "," "," ")
def<-c("54.abc","55a.yy?(choice=agd)","55a.yy?(choice=sbs)","55a.yy?(choice=was)","55a.yy?(choice=bbs)","56b.zz?(choice=ksx)","56b.zz?(coice=ixi")
term<-c("","","","","","","")
parentlabel<-c("","","","","","","")
df<-data.frame(label, def, term, parentlabel)

So the initial dataset looks like this:

label     def                    term   parentlabel
          54.abc
          55a.yy?(choice=agd)
          55a.yy?(choice=sbs)
          55a.yy?(choice=was)
          55a.yy?(choice=bbs)
          56b.zz?(choice=ksx)
          56b.zz?(choice=ixi)

What I want to do is:

So the ideal output would look like:

label     def                    term                  parentlabel
          54.abc
55a.yy    question55a            multifilter
agd       55a.yy?(choice=agd)    value                  question55a
sbs       55a.yy?(choice=sbs)    value                  question55a
was       55a.yy?(choice=was)    value                  question55a
bbs       55a.yy?(choice=bbs)    value                  question55a
56b.zz    question56b           multifilter
ksx       56b.zz?(choice=ksx)    value                  question56b
ixi       56b.zz?(choice=ixi)    value                  question56b

It is a little bit complicated process,I can only write the process but fail to convert them as a chunk of code. Hence, I appreciate it if someone could help. Thanks a lot~~!

Upvotes: 1

Views: 34

Answers (1)

Martin Gal
Martin Gal

Reputation: 16988

I'm not convinced that this approach is very stable applied to a larger dataset, but it could give you some clues how to handle this:

library(dplyr)
library(stringr)
library(purrr)

df %>% 
  group_by(grp = str_extract(def, "^\\d{2}\\w*.\\w{2}")) %>% 
  group_split() %>% 
  map_dfr(
    ~add_row(
      .x, 
      label = unique(.$grp), 
      def = paste0("question", str_extract(label, ".*(?=\\.)")),
      term = "multifilter",
      .before = 1) %>% 
      mutate(
        label = ifelse(!is.na(grp), str_extract(def, "(?<=\\w=)\\w*"), label),
        term = ifelse(!is.na(grp), "value", term),
        parentlabel = ifelse(!is.na(grp), first(def), parentlabel)
        )
    ) %>% 
  select(-grp) %>% 
  filter(!str_detect(def, "\\d$")) %>% 
  mutate(across(c(term, parentlabel), 
                ~ ifelse(str_detect(def, "\\d+\\."), NA, .x))
  )

By using a lot of tidyverse functions, you get

# A tibble: 9 x 4
  label  def                 term        parentlabel
  <chr>  <chr>               <chr>       <chr>      
1 NA     54.abc              NA          NA         
2 55a.yy question55a         multifilter NA         
3 agd    55a.yy?(choice=agd) value       question55a
4 sbs    55a.yy?(choice=sbs) value       question55a
5 was    55a.yy?(choice=was) value       question55a
6 bbs    55a.yy?(choice=bbs) value       question55a
7 56b.zz question56b         multifilter NA         
8 ksx    56b.zz?(choice=ksx) value       question56b
9 ixi    56b.zz?(coice=ixi)  value       question56b

Upvotes: 1

Related Questions