How to reconfigure a data frame: part as is, part in wide format with new columns?

Question

I have a data frame that looks like this...(the short version)

    dat <- data.frame(matrix(NA, nrow = 105, ncol = 2))
    colnames(dat) <- c("1","2")
    dat[,1] <- c("HeaderStart","LevelName","LevelName","LevelName","LevelName","LevelName","Experiment","SessionTime","Subject","DataFileBasename",        
                 "Group","HeaderEnd","LogFrameStart","TrialList","Running","TrialListSample","PlaceBetDEVICE","PlaceBetOnsetTime","PlaceBetRTTime",
                 "PlaceBetRT","PlaceBetCRESP","Result","DiceRollOnsetDelay","DiceRollDurationError","DiceRollACC","DiceRollRESP",
                 "DiceRollOnsetToOnsetTime","Level","Procedure","RollMovie","TrialListCycle","StartBalance","PlaceBetOnsetDelay",
                 "PlaceBetDurationError","PlaceBetACC","PlaceBetRESP","PlaceBetOnsetToOnsetTime","EndBalance","DiceRollOnsetTime","DiceRollRTTime",          
                 "DiceRollRT","DiceRollCRESP","LogFrameEnd","LogFrameStart","TrialList","Running","TrialListSample","PlaceBetDEVICE","PlaceBetOnsetTime","PlaceBetRTTime",
                 "PlaceBetRT","PlaceBetCRESP","Result","DiceRollOnsetDelay","DiceRollDurationError","DiceRollACC","DiceRollRESP",
                 "DiceRollOnsetToOnsetTime","Level","Procedure","RollMovie","TrialListCycle","StartBalance","PlaceBetOnsetDelay",
                 "PlaceBetDurationError","PlaceBetACC","PlaceBetRESP","PlaceBetOnsetToOnsetTime","EndBalance","DiceRollOnsetTime","DiceRollRTTime",          
                 "DiceRollRT","DiceRollCRESP","LogFrameEnd","LogFrameStart","TrialList","Running","TrialListSample","PlaceBetDEVICE","PlaceBetOnsetTime","PlaceBetRTTime",
                 "PlaceBetRT","PlaceBetCRESP","Result","DiceRollOnsetDelay","DiceRollDurationError","DiceRollACC","DiceRollRESP",
                 "DiceRollOnsetToOnsetTime","Level","Procedure","RollMovie","TrialListCycle","StartBalance","PlaceBetOnsetDelay",
                 "PlaceBetDurationError","PlaceBetACC","PlaceBetRESP","PlaceBetOnsetToOnsetTime","EndBalance","DiceRollOnsetTime","DiceRollRTTime",          
                 "DiceRollRT","DiceRollCRESP","LogFrameEnd")

    dat[,2] <- c("HeaderStart","Session","Trial","LogLevel5","LogLevel7","LogLevel9","GameOfDice_CATCH","10:39:59","999","GameOfDice_CATCH-999-1",
                 "1","HeaderEnd","LogFrameStart","5","TrialList","1","Button","199369","231578","32209","","200","367","-999999","0","","0","3",                     
                 "TrialProc","Two","1","1200","66","-999999","0","TwoThreeFourFive","0","1300","241869","0","0","","LogFrameEnd","LogFrameStart",         
                 "4","TrialList","3","Button","246519","248704","2185","","500","281","-999999","0","","0","3","TrialProc","Two","1",                     
                 "1800","117","-999999","0","ThreeFourFiveSix","0","1700","264386","0","0","","LogFrameEnd","LogFrameStart","5",                     
                 "TrialList","5","Button","269069","272355","3286","","1000","285","-999999","0","","0","3","TrialProc","Five","1","2700",                  
                 "84","-999999","0","OneTwoThree","0","2500","282436","0","0","","LogFrameEnd")

[Original Data]1

How can I grab all of the data in between the "LogFrameStart" and "LogFrameEnd" and place it into a new data frame to look like this?...

[Expected Output]2

Edit/Answer:

I ended up writing a for loop instead which solved the problem

c=1
for (row in 1:nrow(df)){
  if (df[row,'1']=='LogFrameStart'){
    sample_data = df[(row+1):(row+29),]
  
  if (c==1){
    newdata = sample_data
    c=c+1
  } else { newdata = cbind(newdata,sample_data[,2])}
  }
}

Peter · Accepted Answer

This is one approach using dplyr and tidyr packages.
A list may be a better way to manage this data, I suppose it depends what you intend doing with it next.
This approach separates the data into two data frames: I assume meta data and trial data.
For convenience stores the two data frames in a list.
Finally a new data frame with the data as your expected output...

library(dplyr, warn = FALSE)
library(tidyr)

# initialise an empty list
dat_ls <- vector("list", length = 2)

# meta data as a data frame in the first list element
dat_ls[[1]] <- 
  slice_head(dat, n = 13) |> 
  rename(col1 = `1`, col2 = `2`)

# trial list data frame in the second list element
dat_ls[[2]] <- dat |> 
  filter(row_number() > 13) |> 
    rename(col1 = `1`, col2 = `2`) |> 
    mutate(trial_id = ifelse(col1 == "TrialList", col2, NA_real_)) |> 
    fill(trial_id) |> 
    mutate(trial_id = as.numeric(trial_id),
           trial_id = c(FALSE, trial_id[-length(trial_id)] != trial_id[-1]),
           #gives each group a unique id and prepares for bind_rows with meta data
           trial_id = paste0("col", cumsum(trial_id) + 2))  |>
    pivot_wider(names_from = trial_id, values_from = col2)
  

# To combine the data frames into one and remove NAs:

df_new <- 
  bind_rows(dat_ls[[1]], dat_ls[[2]]) |> 
  mutate(across(everything(), ~ifelse(is.na(.x), "", .x)))

df_new
#>                        col1                   col2             col3        col4
#> 1               HeaderStart            HeaderStart                             
#> 2                 LevelName                Session                             
#> 3                 LevelName                  Trial                             
#> 4                 LevelName              LogLevel5                             
#> 5                 LevelName              LogLevel7                             
#> 6                 LevelName              LogLevel9                             
#> 7                Experiment       GameOfDice_CATCH                             
#> 8               SessionTime               10:39:59                             
#> 9                   Subject                    999                             
#> 10         DataFileBasename GameOfDice_CATCH-999-1                             
#> 11                    Group                      1                             
#> 12                HeaderEnd              HeaderEnd                             
#> 13            LogFrameStart          LogFrameStart                             
#> 14                TrialList                      5                4           5
#> 15                  Running              TrialList        TrialList   TrialList
#> 16          TrialListSample                      1                3           5
#> 17           PlaceBetDEVICE                 Button           Button      Button
#> 18        PlaceBetOnsetTime                 199369           246519      269069
#> 19           PlaceBetRTTime                 231578           248704      272355
#> 20               PlaceBetRT                  32209             2185        3286
#> 21            PlaceBetCRESP                                                    
#> 22                   Result                    200              500        1000
#> 23       DiceRollOnsetDelay                    367              281         285
#> 24    DiceRollDurationError                -999999          -999999     -999999
#> 25              DiceRollACC                      0                0           0
#> 26             DiceRollRESP                                                    
#> 27 DiceRollOnsetToOnsetTime                      0                0           0
#> 28                    Level                      3                3           3
#> 29                Procedure              TrialProc        TrialProc   TrialProc
#> 30                RollMovie                    Two              Two        Five
#> 31           TrialListCycle                      1                1           1
#> 32             StartBalance                   1200             1800        2700
#> 33       PlaceBetOnsetDelay                     66              117          84
#> 34    PlaceBetDurationError                -999999          -999999     -999999
#> 35              PlaceBetACC                      0                0           0
#> 36             PlaceBetRESP       TwoThreeFourFive ThreeFourFiveSix OneTwoThree
#> 37 PlaceBetOnsetToOnsetTime                      0                0           0
#> 38               EndBalance                   1300             1700        2500
#> 39        DiceRollOnsetTime                 241869           264386      282436
#> 40           DiceRollRTTime                      0                0           0
#> 41               DiceRollRT                      0                0           0
#> 42            DiceRollCRESP                                                    
#> 43              LogFrameEnd            LogFrameEnd      LogFrameEnd LogFrameEnd
#> 44            LogFrameStart          LogFrameStart    LogFrameStart

^{Created on 2022-10-18 with reprex v2.0.2}

How to reconfigure a data frame: part as is, part in wide format with new columns?

Answers (1)

Related Questions