Reputation:
So I was tasked with helping my PI at my research lab come up with a simple function to help choose two random values from a dataframe whilst ignoring NA's. The data we work worth is based on self-report data so we have some areas left unanswered and we get NA's. We don't drop our NA's but during analysis for subscales we do need to select some values while ignoring NA's.
I have a simple test dataframe made and began working on my test function. I know it won't work as I need it to but I am running into a weird error. Here is my code.
rm(list = ls()) ### clear all ###
dev.off(dev.list()["RStudioGD"])
options(scipen=999)
install.package("dplylr")
install.package("magrittr")
library("magrittr")
library("dplyr")
df<- data.frame("Var1"=c(1,7,8), "Var2"=c(NA, NA, 9), "Var3"=c(2, NA,10), "Var4"=c(3,5,NA), "Var5"=c(4,7,NA))
#select columns funciton, picks two random values from data frame not containing NA
select.col <- function(x,y) ##given df x select y columns
df %>%
select_if(is.numeric) %>%
return(z)
b <- select.col(df,2)
so, I'm trying to make a function which selects the second parameter as a random column from each row where NA is not selected.
I would want b to be a new data set and consist of these data in this format:
1, 2
2, 7
3, 9
(basically my data selected randomly without NA's...kind of like elementary x,y/input-output tables)
I ran into this snag and no matter how I rewrite this function I get the same error of "Multi-argument returns are not permitted."
Any ideas or advice?
EDIT: I fixed the formatting for how I want my data to look. Imagine it like a CSV; so in one column is one value and in another is the other value.
Upvotes: 0
Views: 2112
Reputation: 389335
Do you need something like this?
select.col <- function(x,y) {
nc <- y
nr <- nrow(df)
df %>%
#select only numeric columns
select_if(is.numeric) %>%
#Convert data into a vector
unlist %>%
#Remove NA values
na.omit %>%
Select random nc * nr values
sample(nc * nr) %>%
#Convert it into matrix specifying number of rows and columns
matrix(ncol = nc, nrow = nr) %>%
#Convert into dataframe
as.data.frame()
}
select.col(df,2)
# V1 V2
#1 5 1
#2 3 10
#3 8 2
Upvotes: 0
Reputation: 37661
When you write
df %>%
select_if(is.numeric) %>%
return(z)
It is the same as
return(select_if(df, is.numeric), z)
return
does not take two arguments, which is why you get the error message. Also, note that your return statement uses z, but z is not defined anywhere.
Upvotes: 1