Reputation: 141
I have a data frame that looks like this...(the short version)
dat <- data.frame(matrix(NA, nrow = 105, ncol = 2))
colnames(dat) <- c("1","2")
dat[,1] <- c("HeaderStart","LevelName","LevelName","LevelName","LevelName","LevelName","Experiment","SessionTime","Subject","DataFileBasename",
"Group","HeaderEnd","LogFrameStart","TrialList","Running","TrialListSample","PlaceBetDEVICE","PlaceBetOnsetTime","PlaceBetRTTime",
"PlaceBetRT","PlaceBetCRESP","Result","DiceRollOnsetDelay","DiceRollDurationError","DiceRollACC","DiceRollRESP",
"DiceRollOnsetToOnsetTime","Level","Procedure","RollMovie","TrialListCycle","StartBalance","PlaceBetOnsetDelay",
"PlaceBetDurationError","PlaceBetACC","PlaceBetRESP","PlaceBetOnsetToOnsetTime","EndBalance","DiceRollOnsetTime","DiceRollRTTime",
"DiceRollRT","DiceRollCRESP","LogFrameEnd","LogFrameStart","TrialList","Running","TrialListSample","PlaceBetDEVICE","PlaceBetOnsetTime","PlaceBetRTTime",
"PlaceBetRT","PlaceBetCRESP","Result","DiceRollOnsetDelay","DiceRollDurationError","DiceRollACC","DiceRollRESP",
"DiceRollOnsetToOnsetTime","Level","Procedure","RollMovie","TrialListCycle","StartBalance","PlaceBetOnsetDelay",
"PlaceBetDurationError","PlaceBetACC","PlaceBetRESP","PlaceBetOnsetToOnsetTime","EndBalance","DiceRollOnsetTime","DiceRollRTTime",
"DiceRollRT","DiceRollCRESP","LogFrameEnd","LogFrameStart","TrialList","Running","TrialListSample","PlaceBetDEVICE","PlaceBetOnsetTime","PlaceBetRTTime",
"PlaceBetRT","PlaceBetCRESP","Result","DiceRollOnsetDelay","DiceRollDurationError","DiceRollACC","DiceRollRESP",
"DiceRollOnsetToOnsetTime","Level","Procedure","RollMovie","TrialListCycle","StartBalance","PlaceBetOnsetDelay",
"PlaceBetDurationError","PlaceBetACC","PlaceBetRESP","PlaceBetOnsetToOnsetTime","EndBalance","DiceRollOnsetTime","DiceRollRTTime",
"DiceRollRT","DiceRollCRESP","LogFrameEnd")
dat[,2] <- c("HeaderStart","Session","Trial","LogLevel5","LogLevel7","LogLevel9","GameOfDice_CATCH","10:39:59","999","GameOfDice_CATCH-999-1",
"1","HeaderEnd","LogFrameStart","5","TrialList","1","Button","199369","231578","32209","","200","367","-999999","0","","0","3",
"TrialProc","Two","1","1200","66","-999999","0","TwoThreeFourFive","0","1300","241869","0","0","","LogFrameEnd","LogFrameStart",
"4","TrialList","3","Button","246519","248704","2185","","500","281","-999999","0","","0","3","TrialProc","Two","1",
"1800","117","-999999","0","ThreeFourFiveSix","0","1700","264386","0","0","","LogFrameEnd","LogFrameStart","5",
"TrialList","5","Button","269069","272355","3286","","1000","285","-999999","0","","0","3","TrialProc","Five","1","2700",
"84","-999999","0","OneTwoThree","0","2500","282436","0","0","","LogFrameEnd")
How can I grab all of the data in between the "LogFrameStart" and "LogFrameEnd" and place it into a new data frame to look like this?...
Edit/Answer:
I ended up writing a for loop instead which solved the problem
c=1
for (row in 1:nrow(df)){
if (df[row,'1']=='LogFrameStart'){
sample_data = df[(row+1):(row+29),]
if (c==1){
newdata = sample_data
c=c+1
} else { newdata = cbind(newdata,sample_data[,2])}
}
}
Upvotes: 0
Views: 54
Reputation: 12729
This is one approach using dplyr and tidyr packages.
A list may be a better way to manage this data, I suppose it depends what you intend doing with it next.
This approach separates the data into two data frames: I assume meta data and trial data.
For convenience stores the two data frames in a list.
Finally a new data frame with the data as your expected output...
library(dplyr, warn = FALSE)
library(tidyr)
# initialise an empty list
dat_ls <- vector("list", length = 2)
# meta data as a data frame in the first list element
dat_ls[[1]] <-
slice_head(dat, n = 13) |>
rename(col1 = `1`, col2 = `2`)
# trial list data frame in the second list element
dat_ls[[2]] <- dat |>
filter(row_number() > 13) |>
rename(col1 = `1`, col2 = `2`) |>
mutate(trial_id = ifelse(col1 == "TrialList", col2, NA_real_)) |>
fill(trial_id) |>
mutate(trial_id = as.numeric(trial_id),
trial_id = c(FALSE, trial_id[-length(trial_id)] != trial_id[-1]),
#gives each group a unique id and prepares for bind_rows with meta data
trial_id = paste0("col", cumsum(trial_id) + 2)) |>
pivot_wider(names_from = trial_id, values_from = col2)
# To combine the data frames into one and remove NAs:
df_new <-
bind_rows(dat_ls[[1]], dat_ls[[2]]) |>
mutate(across(everything(), ~ifelse(is.na(.x), "", .x)))
df_new
#> col1 col2 col3 col4
#> 1 HeaderStart HeaderStart
#> 2 LevelName Session
#> 3 LevelName Trial
#> 4 LevelName LogLevel5
#> 5 LevelName LogLevel7
#> 6 LevelName LogLevel9
#> 7 Experiment GameOfDice_CATCH
#> 8 SessionTime 10:39:59
#> 9 Subject 999
#> 10 DataFileBasename GameOfDice_CATCH-999-1
#> 11 Group 1
#> 12 HeaderEnd HeaderEnd
#> 13 LogFrameStart LogFrameStart
#> 14 TrialList 5 4 5
#> 15 Running TrialList TrialList TrialList
#> 16 TrialListSample 1 3 5
#> 17 PlaceBetDEVICE Button Button Button
#> 18 PlaceBetOnsetTime 199369 246519 269069
#> 19 PlaceBetRTTime 231578 248704 272355
#> 20 PlaceBetRT 32209 2185 3286
#> 21 PlaceBetCRESP
#> 22 Result 200 500 1000
#> 23 DiceRollOnsetDelay 367 281 285
#> 24 DiceRollDurationError -999999 -999999 -999999
#> 25 DiceRollACC 0 0 0
#> 26 DiceRollRESP
#> 27 DiceRollOnsetToOnsetTime 0 0 0
#> 28 Level 3 3 3
#> 29 Procedure TrialProc TrialProc TrialProc
#> 30 RollMovie Two Two Five
#> 31 TrialListCycle 1 1 1
#> 32 StartBalance 1200 1800 2700
#> 33 PlaceBetOnsetDelay 66 117 84
#> 34 PlaceBetDurationError -999999 -999999 -999999
#> 35 PlaceBetACC 0 0 0
#> 36 PlaceBetRESP TwoThreeFourFive ThreeFourFiveSix OneTwoThree
#> 37 PlaceBetOnsetToOnsetTime 0 0 0
#> 38 EndBalance 1300 1700 2500
#> 39 DiceRollOnsetTime 241869 264386 282436
#> 40 DiceRollRTTime 0 0 0
#> 41 DiceRollRT 0 0 0
#> 42 DiceRollCRESP
#> 43 LogFrameEnd LogFrameEnd LogFrameEnd LogFrameEnd
#> 44 LogFrameStart LogFrameStart LogFrameStart
Created on 2022-10-18 with reprex v2.0.2
Upvotes: 1