Reputation: 6094
i'm struggling with how to read the quoted text NULL as an NA string when importing R character columns.. i am just trying to get that 3rd row to be read in as missing, but instead i'm getting the text "NULL"
# simple dummy dataset with six rows and two columns
w <- mtcars
w[ , 'cyl' ] <- as.character( w[ , 'cyl' ] )
w[ 3 , 'cyl' ] <- 'NULL'
w <- w[ 1:6 , c( 'cyl' , 'mpg' ) ]
# temporary filepath
tf1 <- paste0( tempfile() , '.csv' )
# save dummy dataset to disk
data.table::fwrite( w , tf1 , quote = TRUE )
# dummy dataset looks like this:
readLines(tf1)
# both of these work if columns are read as a numeric type
data.table::fread( tf1 , na.strings = '"NULL"' )
data.table::fread( tf1 , na.strings = '\"NULL\"' )
# attempt at reading character column all fail
data.table::fread( tf1 , colClasses = c( 'character' , 'numeric' ) , na.strings = 'NULL' )
data.table::fread( tf1 , colClasses = c( 'character' , 'numeric' ) , na.strings = '"NULL"' )
data.table::fread( tf1 , colClasses = c( 'character' , 'numeric' ) , na.strings = '\"NULL\"' )
data.table::fread( tf1 , colClasses = c( 'character' , 'numeric' ) , na.strings = '\\"NULL\\"' )
data.table::fread( tf1 , colClasses = c( 'character' , 'numeric' ) , na.strings = '`\\"NULL\\"`' )
any ideas about this would be appreciated! thanks
Upvotes: 2
Views: 123
Reputation: 1
Because of how the string is quoted in the CSV, your attempts to set na.strings to "NULL" are not being recognized correctly when the column is specified as character. To read the character column and interpret "NULL" as NA, you can import the data without specifying colClasses, then replace the "NULL" strings with NA.
library(data.table)
# Read the data without specifying colClasses
w_data <- fread(tf1, na.strings = c("NULL", '"NULL"', '\"NULL\"'))
# Replace "NULL" with NA
w_data[cyl == "NULL", cyl := NA]
First, import the dataset with fread() using the na.strings argument to cover all variations of "NULL" that may exist (including quoted versions). After importing, you can replace instances of "NULL" in the cyl column with NA directly. This should allow you to handle "NULL" strings as missing values appropriately.
Upvotes: 0