Reputation: 3
I have a loop that recodes values of a column and breaks when a condition is met. I would like to use this loop, or its basic concept, on a list of data frames with the same format.
sample data:
Id <- as.factor(c(rep("01001", 11), rep("01043", 11), rep("01065", 11), rep("01069", 11)))
YearCode <- as.numeric(rep(1:11, 4))
Type <- c(NA,NA,NA,NA,NA,NA,NA,2,NA,NA,NA,NA,NA,NA,
NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,2,NA)
test <- NA
sample_df <- data.frame(Id, YearCode, Type, test)
# A part of sample_df
one_df <- subset(sample_df, sample_df$Id=="01069")
This for loop works fine for one data frame:
# example for loop using example data frame "one_df"
for(i in seq(along=one_df$Id)){
if(is.na(one_df$Type[i])){ # if Type is NA, recode to 0
one_df$test[i] <- 0
} else { # Stop when Type is not NA, and leave remaining NAs that come after
break }
}
However, I have many data frames with this same format in a list. I would like to keep them in the list and apply this loop over the whole list.
# example list : split data frame into list by Id
sample_list <- split(sample_df, sample_df$Id, drop = TRUE)
I've looked around other posts such as this one, but I get stuck when trying to loop over each data frame in the list or write a similar function using lapply. How can I modify this loop to work on the list (sample_list), using either a for loop, lapply, or something else?
Any tips would be greatly appreciated, let me know if I need to clarify anything. Thanks!
Upvotes: 0
Views: 2336
Reputation: 23574
I think the following would do the job that you described. What I did is the following. I first created a new column called test
with if_else()
. If complete.cases(Type) is TRUE, then use a value from Type
. Otherwise use 0. The next step was to replace some specific 0s with NA. Since you do not want to have 0s in rows which come after the row with the first numeric value in Type
. For instance, you do not want to have 0s after the 10th row for Id == 01069. So I created the testing condition: row_number() > which(complete.cases(Type))[1]
. You can read this as "whether a row number is larger than the row number for the first numeric value." Using this condition, I replaced 0s with NA. I provided a part of the result for sample_df
. I hope this will help your work.
library(dplyr)
sample_df %>%
group_by(Id) %>%
mutate(test = if_else(complete.cases(Type), Type, 0),
test = if_else(row_number() > which(complete.cases(Type))[1],
NA_real_, test)) -> out
# Id YearCode Type test
# <fctr> <dbl> <dbl> <dbl>
#1 01001 1 NA 0
#2 01001 2 NA 0
#3 01001 3 NA 0
#4 01001 4 NA 0
#5 01001 5 NA 0
#6 01001 6 NA 0
#7 01001 7 NA 0
#8 01001 8 2 2
#9 01001 9 NA NA
#10 01001 10 NA NA
#11 01001 11 NA NA
#------------------------------
#34 01069 1 NA 0
#35 01069 2 NA 0
#36 01069 3 NA 0
#37 01069 4 NA 0
#38 01069 5 NA 0
#39 01069 6 NA 0
#40 01069 7 NA 0
#41 01069 8 NA 0
#42 01069 9 NA 0
#43 01069 10 2 2
#44 01069 11 NA NA
EDIT
The OP wants to have 0 when Type contains NAs only, according to his/her comment. The following will do the job.
sample_df %>%
group_by(Id) %>%
mutate(test = if_else(complete.cases(Type), Type, 0),
test = if_else(row_number() > which(complete.cases(Type))[1],
NA_real_, test),
foo = sum(Type, na.rm = TRUE),
test = replace(test, which(foo == 0), 0)) %>%
select(-foo) -> out
# A part of the result
# Id YearCode Type test
# <fctr> <dbl> <dbl> <dbl>
#1 01001 1 NA 0
#2 01001 2 NA 0
#3 01001 3 NA 0
#4 01001 4 NA 0
#5 01001 5 NA 0
#6 01001 6 NA 0
#7 01001 7 NA 0
#8 01001 8 2 2
#9 01001 9 NA NA
#10 01001 10 NA NA
#11 01001 11 NA NA
#12 01043 1 NA 0
#13 01043 2 NA 0
#14 01043 3 NA 0
#15 01043 4 NA 0
#16 01043 5 NA 0
#17 01043 6 NA 0
#18 01043 7 NA 0
#19 01043 8 NA 0
#20 01043 9 NA 0
#21 01043 10 NA 0
#22 01043 11 NA 0
Upvotes: 2
Reputation: 2140
IS there an issue with creating a function and using lapply? it seems to work
#rm(list=ls())
Id <- as.factor(c(rep("01001", 11), rep("01043", 11), rep("01065", 11), rep("01069", 11)))
YearCode <- as.numeric(rep(1:11, 4))
Type <- c(NA,NA,NA,NA,NA,NA,NA,2,NA,NA,NA,NA,NA,NA,
NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,2,NA)
test <- NA
sample_df <- data.frame(Id, YearCode, Type, test)
# A part of sample_df
one_df <- subset(sample_df, sample_df$Id=="01069")
sample_list <- split(sample_df, sample_df$Id, drop = TRUE)
####################################
# for loop as funciton
fnX<- function(myDF){
for(i in seq(along=myDF$Id)){
if(is.na(myDF$Type[i])){ # if Type is NA, recode to 0
myDF$test[i] <- 0
} else { # Stop and leave remaining NAs that come after
break }
}
myDF
}
#apply function
fnX(sample_list$`01069`)
lapply(sample_list,fnX)
Upvotes: 0