Chris
Chris

Reputation: 735

Deriving a variable from a column name passed to a function

I've gotten hold of some really messy data and I wrote a function to do some conversions (string to numeric), and I would love to improve it. Basically the function takes a vector of messy character data and converts the data to numeric.

for example:

##  say you had this
df1 <- data.frame ( V1 = c("   $25.25", "4,828", "      $7,253"), V2 = c( "THIS is bad data", "725", "*error"))

numconv <- function(vec){
    vec <- str_trim(vec)
    vec <- gsub(",|\\$", "", vec)
    if( sum(!grepl( "[0-9]",vec)) == 0){
        vec <- as.numeric(vec)
    }
    if( sum(!grepl( "[0-9]",vec)) != 0){
        print("!!ERROR STRANGE CHARACTERS!!")
    }
}

df1$V1recode <- numconv(df1$V1)
df1$V2recode <- numconv(df1$V2)
[1] "!!ERROR STRANGE CHARACTERS!!"

How do can I assign the name of the original column name within the function so I can paste it to the error message within the function, so it instead reads:

!!ERROR STRANGE CHARACTER IN V2!!

I've tried calling names() and colnames() within the function, but this doesn't seem to work.

Thanks in advance, C

Upvotes: 1

Views: 151

Answers (3)

IRTFM
IRTFM

Reputation: 263471

The old deparse(substitute(.)) trick seems to work.

numconv <- function(vec){nam <- deparse(substitute(vec))
    vec <- gsub(" ","", vec)
    vec <- gsub(",|\\$", "", vec)
    if( sum(!grepl( "[0-9]",vec)) == 0){
        vec <- as.numeric(vec)
    }
    if( sum(!grepl( "[0-9]",vec)) != 0){
        print(paste("!!ERROR STRANGE CHARACTERS!!", nam) )
    }
}
df1$V2recode <- numconv(df1$V2)
# [1] "!!ERROR STRANGE CHARACTERS!! df1$V2"

(I didn't load stringr since I thought a gsub call would be more efficient.)

Upvotes: 2

SchaunW
SchaunW

Reputation: 3601

The key is to wrap the recoding up into the function as well. That way you can keep track of which columns you're working on and so get the column names to put in your warning message. The following function recodes whatever columns of a data frame are listed in the 'col_names' argument (if left null the function applies to all of them). The function returns the original data frame, plus the recoded columns with the string in flag added to the column names.

require(stringr)

df1 <- data.frame (
  V1 = c("   $25.25", "4,828", "      $7,253"), 
  V2 = c( "THIS is bad data", "725", "*error"))

numconv <- function(df, col_names = NULL, flag = "recode"){

  if(is.null(col_names)) {
    col_names <- colnames(df)
  }
    out <- lapply(1:length(col_names), function(i) {
      vec <- str_trim(df[,col_names[i]])
      vec <- gsub(",|\\$", "", vec)
      if( sum(!grepl( "[0-9]",vec)) == 0){
        vec <- as.numeric(vec)
      }
      if( sum(!grepl( "[0-9]",vec)) != 0){
        print(paste("!!ERROR STRANGE CHARACTERS in", col_names[i], "!!"))
      }
      vec
    })

    out <- data.frame(out, stringsAsFactors = FALSE)
    colnames(out) <- paste(col_names, flag, sep = "")
    cbind(df, out)
}

numconv(df1)
[1] "!!ERROR STRANGE CHARACTERS in V2 !!"
V1               V2 V1recode         V2recode
1       $25.25 THIS is bad data    25.25 THIS is bad data
2        4,828              725  4828.00              725
3       $7,253           *error  7253.00           *error

Upvotes: 1

Simon O&#39;Hanlon
Simon O&#39;Hanlon

Reputation: 60000

I feel this is a somewhat hacky way to do this, but you could use substitue and then strsplit on the $, but this assumes you always call a column using its name with $. Anyway, you can get the column name using this and paste it into an error message as you wish...

    x <- strsplit(as.character( substitute(vec) ) ,"$" )[[3]]

Upvotes: 1

Related Questions