Reputation: 352
I have tried to find an answer to what appears to be a simple question but without any success.
I want to create a function which would operate on different variables for different data frames. All that the function needs to do is search for the value "don't know" and replace it with NA. I would do this manually as follows:
raw.df$S8[raw.df$S8 == "Don't know"] <- NA
As an exercise in learning R I would like to do this by function but cannot find a way to reference the inputs to the function.
In this example code I cannot even create a vector which is a copy of the dataframe variable I want to recode - it is coming out as NULL. So until I know how to do this part, I can't progress to recoding values as NA.
> NADK <- function(df,x) {
+ DDD <<- df$x
+ }
>
> NADK(raw.df, S8)
> DDD
NULL
I am assuming that I cannot use the commands df$x and expect r to know that this is coming from the function inputs?
Upvotes: 0
Views: 342
Reputation: 52008
Rather than writing a function which hardwires in "Don't know"
it seems more flexible to have that as an argument to the function. Something like:
to.na <- function(df,x,na.string){
df[x][df[x] == na.string] <- NA
df
}
This returns the altered dataframe.
For example, if
df <- data.frame(Name = c("Larry", "Curly", "Moe"),BirthYear = c(1900, 1910, 1920), DeathYear = c("1950", "1960", "Not dead"))
So that df
is
Name BirthYear DeathYear
1 Larry 1900 1950
2 Curly 1910 1960
3 Moe 1920 Not dead
Then:
> df <- to.na(df,"DeathYear","Not dead")
> df
Name BirthYear DeathYear
1 Larry 1900 1950
2 Curly 1910 1960
3 Moe 1920 <NA>
If you are reading the dataframe from a file by using read.table
(or associated functions like read.csv
) then you might be able to avoid the problem to begin with by using the parameter na.strings
. See ?read.table
for details.
Upvotes: 2