mpvalenc
mpvalenc

Reputation: 61

Inserting NA's into specific rows and columns in R

Here is a sample of my dataframe:

df3 <- data.frame(Frame = c(219388, 219389, 219390, 211387, 211388, 211389), Time = c("2020-06-05 13:26:39", "2020-06-05 13:26:39", "2020-06-05 13:26:39", "2020-06-05 13:26:39", "2020-06-05 13:26:39", "2020-06-05 13:26:39"),task = c("hop", "hop", "hop", "vj", "vj", "vj"), limb = c("L", "L", "L", "R", "R", "R"), trial = c("trial1", "trial1", "trial1", "trial2", "trial2", "trial2"))

I want to add NA's to specific rows in the Frame and Time column (amount of NA rows to be added will vary in my real dataset). I also need to continue the task, limb, and trial column accordingly (i.e. hop, L, trial1 continues even on NA rows). My expected output to look like this:

> df3 
Frame             Time               task     limb    trial   
219388    2020-06-05 13:26:39        hop       L      trial1
219389    2020-06-05 13:26:39        hop       L      trial1
219390    2020-06-05 13:26:39        hop       L      trial1
NA                 NA                hop       L      trial1
NA                 NA                hop       L      trial1
NA                 NA                hop       L      trial1
211387    2020-06-05 13:26:39        vj        R      trial2
211388    2020-06-05 13:26:39        vj        R      trial2
211389    2020-06-05 13:26:39        vj        R      trial2
NA                 NA                vj        R      trial2
NA                 NA                vj        R      trial2

I've tried insertRows from the berryFunctions package, however this changes the whole row to NA and I need task, limb, and trial columns to continue.

insertRows(df3, r=c(3:5), new=NA, rcurrent=FALSE)

Any help or suggestions would be much appreciated, thank you!

Upvotes: 1

Views: 1227

Answers (1)

akrun
akrun

Reputation: 887941

We could group_split based on 'task' to 'trial' column into a list of data.frames, then loop over the list with map2, slice the first row, convert the 'Frame', 'Time' to NA, expand the dataset rows with uncountusing the replication values passed in map2, bind the dataset with the original dataset (bind_rows) and as we are using map2_dfr, it returns a single data.frame by row binding the list

library(dplyr) #1.0.0
library(purrr)
library(tidyr)
df3 %>%
     group_split(across(task:trial)) %>%
     map2_dfr(c(3, 2), ~ 
         slice(.x, 1) %>% 
         mutate(across(Frame:Time, ~NA)) %>% 
         uncount(.y) %>% 
         bind_rows(.x, .))
# A tibble: 11 x 5
#    Frame Time                task  limb  trial 
#    <dbl> <chr>               <chr> <chr> <chr> 
# 1 219388 2020-06-05 13:26:39 hop   L     trial1
# 2 219389 2020-06-05 13:26:39 hop   L     trial1
# 3 219390 2020-06-05 13:26:39 hop   L     trial1
# 4     NA <NA>                hop   L     trial1
# 5     NA <NA>                hop   L     trial1
# 6     NA <NA>                hop   L     trial1
# 7 211387 2020-06-05 13:26:39 vj    R     trial2
# 8 211388 2020-06-05 13:26:39 vj    R     trial2
# 9 211389 2020-06-05 13:26:39 vj    R     trial2
#10     NA <NA>                vj    R     trial2
#11     NA <NA>                vj    R     trial2

The group_split is similar to base R split except that it have some options to either keep the grouping variables in the list of data.frames or not (and it won't name the list elements). The idea is to split into chunks of data.frame in a list where the values are the same in the grouping columns. So, it is a way of splitting the dataset automatically without manually suggesting the row at which it needs to add more NA rows.


Also, if the number of NAs to be added are constant, another option is group_by, summarise (in the dplyr 1.0.0 - summarise can return more than 1 row)

df3  %>%
     group_by(across(task:trial)) %>%
     summarise(across(everything(), ~ c(., rep(NA, 3))))
# A tibble: 12 x 5
# Groups:   task, limb, trial [2]
#   task  limb  trial   Frame Time               
#   <chr> <chr> <chr>   <dbl> <chr>              
# 1 hop   L     trial1 219388 2020-06-05 13:26:39
# 2 hop   L     trial1 219389 2020-06-05 13:26:39
# 3 hop   L     trial1 219390 2020-06-05 13:26:39
# 4 hop   L     trial1     NA <NA>               
# 5 hop   L     trial1     NA <NA>               
# 6 hop   L     trial1     NA <NA>               
# 7 vj    R     trial2 211387 2020-06-05 13:26:39
# 8 vj    R     trial2 211388 2020-06-05 13:26:39
# 9 vj    R     trial2 211389 2020-06-05 13:26:39
#10 vj    R     trial2     NA <NA>               
#11 vj    R     trial2     NA <NA>               
#12 vj    R     trial2     NA <NA>      

Also, with berryFunctions, after creating NA rows using insertRows, fill the columns of interest

library(berryFunctions)
insertRows(df3, r=4:6, new=NA, rcurrent= FALSE) %>% 
       insertRows(., r = 10) %>%
       fill(task:trial)
#    Frame                Time task limb  trial
#1  219388 2020-06-05 13:26:39  hop    L trial1
#2  219389 2020-06-05 13:26:39  hop    L trial1
#3  219390 2020-06-05 13:26:39  hop    L trial1
#4      NA                <NA>  hop    L trial1
#5      NA                <NA>  hop    L trial1
#6      NA                <NA>  hop    L trial1
#7  211387 2020-06-05 13:26:39   vj    R trial2
#8  211388 2020-06-05 13:26:39   vj    R trial2
#9  211389 2020-06-05 13:26:39   vj    R trial2
#10     NA                <NA>   vj    R trial2
#11     NA                <NA>   vj    R trial2

Upvotes: 1

Related Questions