Reputation: 101
I'm trying to check the "pin" numbers of cases with missing data for each variable of interest in my dataset.
Here are some fake data:
c <- data.frame(pin = c(1, 2, 3, 4), type = c(1, 1, 2, 2), v1 = c(1, NA, NA,
NA), v2 = c(NA, NA, 1, 1))
I wrote a function "m.pin" to do this:
m.pin <- function(x, data = "c", return = "$pin") {
sect <- gsub("^.*\\[", "\\[", deparse(substitute(x)))
vect <- eval(parse(text = paste(data, return, sect, sep = "")))
return(vect[is.na(x)])
}
And I use it like so:
m.pin(c$v1[c$type == 1])
[1] 2
I wrote a function to apply "m.pin" over a list of variables to only return pins with missing data:
return.m.pin <- function(x, fun = m.pin) {
val.list <- lapply(x, fun)
condition <- lapply(val.list, function(x) length(x) > 0)
val.list[unlist(condition)]
}
But when I apply it, I get this error:
l <- lst(c$v1[c$type == 1], c$v2[c$type == 2])
return.m.pin(l)
Error in parse(text = paste(data, return, sect, sep = "")) :
<text>:1:9: unexpected ']'
1: c$pin[i]]
^
How can I rewrite my function(s) to avoid this issue?
Many thanks!
Upvotes: 0
Views: 612
Reputation: 145965
I would suggest rewriting like this (if this approach is to be taken at all). I call your data d
because c
is already the name of an extremely common function.
# string column names, pass in the data frame as an object
# means no need for eval, parse, substitute, etc.
foo = function(data, na_col, return_col = "pin", filter_col, filter_val) {
if(! missing(filter_col) & ! missing(filter_val)) {
data = data[data[, filter_col] == filter_val, ]
}
data[is.na(data[, na_col]), return_col]
}
# working on the whole data frame
foo(d, na_col = "v1", return_col = "pin")
# [1] 2 3 4
# passing in a subset of the data
foo(d[d$type == 1, ], "v1", "pin")
# [1] 2
# using function arguments to subset the data
foo(d, "v1", "pin", filter_col = "type", filter_val = 1)
# [1] 2
# calling it with changing arguments:
# you could use `Map` or `mapply` to be fancy, but this for loop is nice and clear
inputs = data.frame(na_col = c("v1", "v2"), filter_val = c(1, 2), stringsAsFactors = FALSE)
result = list()
for (i in 1:nrow(inputs)) {
result[[i]] = foo(d, na_col = inputs$na_col[i], return_col = "pin",
filter_col = "type", filter_val = inputs$filter_val[i])
}
result
# [[1]]
# [1] 2
#
# [[2]]
# numeric(0)
A different approach I would suggest is melting your data into a long format, and simply taking a subset of the NA
values, hence getting all combinations of type
and the v*
columns that have NA
values at once. Do this once, and no function is needed to look up individual combinations.
d_long = reshape2::melt(d, id.vars = c("pin", "type"))
library(dplyr)
d_long %>% filter(is.na(value)) %>%
arrange(variable, type)
# pin type variable value
# 1 2 1 v1 NA
# 2 3 2 v1 NA
# 3 4 2 v1 NA
# 4 1 1 v2 NA
# 5 2 1 v2 NA
Upvotes: 2
Reputation: 50718
Please see Gregor's comment for the most critical issues with your code (to add: don't use return
as a variable name as it is the name of a base R function).
It's not clear to me why you want to define a specific function m.pin
, nor what you ultimately are trying to do, but I am assuming this is a critical design component.
Rewriting m.pin
as
m.pin <- function(df, type, vcol) which(df[, "type"] == type & is.na(df[, vcol]))
we get
m.pin(df, 1, "v1")
#[1] 2
Or to identify rows with NA
in "v1"
for all type
s
lapply(unique(df$type), function(x) m.pin(df, x, "v1"))
#[[1]]
#[1] 2
#
#[[2]]
#[1] 3 4
In response to Gregor's comment, perhaps this is what you're after?
by(df, df$type, function(x)
list(v1 = x$pin[which(is.na(x$v1))], v2 = x$pin[which(is.na(x$v2))]))
# df$type: 1
# $v1
# [1] 2
#
# $v2
# [1] 1 2
#
# ------------------------------------------------------------
# df$type: 2
# $v1
# [1] 3 4
#
# $v2
# integer(0)
This returns a list
of the pin
numbers for every type
and NA
entries in v1
/v2
.
df <- data.frame(
pin = c(1, 2, 3, 4),
type = c(1, 1, 2, 2),
v1 = c(1, NA, NA, NA),
v2 = c(NA, NA, 1, 1))
Upvotes: 2