Jklein
Jklein

Reputation: 101

Parse unexpected symbol error in function applied over list

I'm trying to check the "pin" numbers of cases with missing data for each variable of interest in my dataset.

Here are some fake data:

c <- data.frame(pin = c(1, 2, 3, 4), type = c(1, 1, 2, 2), v1 = c(1, NA, NA, 
NA), v2 = c(NA, NA, 1, 1))

I wrote a function "m.pin" to do this:

m.pin <- function(x, data = "c", return = "$pin") {
  sect <- gsub("^.*\\[", "\\[", deparse(substitute(x)))
  vect <- eval(parse(text = paste(data, return, sect, sep = "")))
  return(vect[is.na(x)])
}

And I use it like so:

m.pin(c$v1[c$type == 1])
[1] 2

I wrote a function to apply "m.pin" over a list of variables to only return pins with missing data:

return.m.pin <- function(x, fun = m.pin) {
  val.list <- lapply(x, fun)
  condition <- lapply(val.list, function(x) length(x) > 0)
  val.list[unlist(condition)]
}

But when I apply it, I get this error:

l <- lst(c$v1[c$type == 1], c$v2[c$type == 2])
return.m.pin(l) 
Error in parse(text = paste(data, return, sect, sep = "")) :
  <text>:1:9: unexpected ']'
1: c$pin[i]]
            ^

How can I rewrite my function(s) to avoid this issue?

Many thanks!

Upvotes: 0

Views: 612

Answers (2)

Gregor Thomas
Gregor Thomas

Reputation: 145965

I would suggest rewriting like this (if this approach is to be taken at all). I call your data d because c is already the name of an extremely common function.

# string column names, pass in the data frame as an object
# means no need for eval, parse, substitute, etc.
foo = function(data, na_col, return_col = "pin", filter_col, filter_val) {
  if(! missing(filter_col) & ! missing(filter_val)) {
    data = data[data[, filter_col] == filter_val, ]
  }  
  data[is.na(data[, na_col]), return_col]
}

# working on the whole data frame
foo(d, na_col = "v1", return_col = "pin")
# [1] 2 3 4

# passing in a subset of the data
foo(d[d$type == 1, ], "v1", "pin")
# [1] 2

# using function arguments to subset the data
foo(d, "v1", "pin", filter_col = "type", filter_val = 1)
# [1] 2


# calling it with changing arguments:
# you could use `Map` or `mapply` to be fancy, but this for loop is nice and clear
inputs = data.frame(na_col = c("v1", "v2"), filter_val = c(1, 2), stringsAsFactors = FALSE)
result = list()
for (i in 1:nrow(inputs)) {
  result[[i]] = foo(d, na_col = inputs$na_col[i], return_col = "pin",
                    filter_col = "type", filter_val = inputs$filter_val[i])
}
result
# [[1]]
# [1] 2
# 
# [[2]]
# numeric(0)

A different approach I would suggest is melting your data into a long format, and simply taking a subset of the NA values, hence getting all combinations of type and the v* columns that have NA values at once. Do this once, and no function is needed to look up individual combinations.

d_long = reshape2::melt(d, id.vars = c("pin", "type"))

library(dplyr)
d_long %>% filter(is.na(value)) %>%
  arrange(variable, type)
#   pin type variable value
# 1   2    1       v1    NA
# 2   3    2       v1    NA
# 3   4    2       v1    NA
# 4   1    1       v2    NA
# 5   2    1       v2    NA

Upvotes: 2

Maurits Evers
Maurits Evers

Reputation: 50718

Please see Gregor's comment for the most critical issues with your code (to add: don't use return as a variable name as it is the name of a base R function).

It's not clear to me why you want to define a specific function m.pin, nor what you ultimately are trying to do, but I am assuming this is a critical design component.

Rewriting m.pin as

m.pin <- function(df, type, vcol) which(df[, "type"] == type & is.na(df[, vcol]))

we get

m.pin(df, 1, "v1")
#[1] 2

Or to identify rows with NA in "v1" for all types

lapply(unique(df$type), function(x) m.pin(df, x, "v1"))
#[[1]]
#[1] 2
#
#[[2]]
#[1] 3 4

Update

In response to Gregor's comment, perhaps this is what you're after?

by(df, df$type, function(x)
    list(v1 = x$pin[which(is.na(x$v1))], v2 = x$pin[which(is.na(x$v2))]))
#    df$type: 1
#    $v1
#    [1] 2
#
#    $v2
#    [1] 1 2
#
#    ------------------------------------------------------------
#    df$type: 2
#    $v1
#    [1] 3 4
#
#    $v2
#    integer(0)

This returns a list of the pin numbers for every type and NA entries in v1/v2.


Sample data

df <- data.frame(
    pin = c(1, 2, 3, 4), 
    type = c(1, 1, 2, 2), 
    v1 = c(1, NA, NA, NA), 
    v2 = c(NA, NA, 1, 1))

Upvotes: 2

Related Questions