Jorvik77
Jorvik77

Reputation: 352

Simple function to classify NA values

I have tried to find an answer to what appears to be a simple question but without any success.

I want to create a function which would operate on different variables for different data frames. All that the function needs to do is search for the value "don't know" and replace it with NA. I would do this manually as follows:

raw.df$S8[raw.df$S8 == "Don't know"] <- NA

As an exercise in learning R I would like to do this by function but cannot find a way to reference the inputs to the function.

In this example code I cannot even create a vector which is a copy of the dataframe variable I want to recode - it is coming out as NULL. So until I know how to do this part, I can't progress to recoding values as NA.

> NADK <- function(df,x) {
+  DDD <<- df$x
+ }
> 
> NADK(raw.df, S8)
> DDD
NULL

I am assuming that I cannot use the commands df$x and expect r to know that this is coming from the function inputs?

Upvotes: 0

Views: 342

Answers (1)

John Coleman
John Coleman

Reputation: 52008

Rather than writing a function which hardwires in "Don't know" it seems more flexible to have that as an argument to the function. Something like:

to.na <- function(df,x,na.string){
  df[x][df[x] == na.string] <- NA
  df
}

This returns the altered dataframe.

For example, if

df <- data.frame(Name = c("Larry", "Curly", "Moe"),BirthYear = c(1900, 1910, 1920), DeathYear = c("1950", "1960", "Not dead"))

So that df is

   Name BirthYear DeathYear
1 Larry      1900      1950
2 Curly      1910      1960
3   Moe      1920  Not dead

Then:

> df <- to.na(df,"DeathYear","Not dead")
> df
   Name BirthYear DeathYear
1 Larry      1900      1950
2 Curly      1910      1960
3   Moe      1920      <NA>

If you are reading the dataframe from a file by using read.table (or associated functions like read.csv) then you might be able to avoid the problem to begin with by using the parameter na.strings. See ?read.table for details.

Upvotes: 2

Related Questions