Reputation: 357
In R I am pulling in a data set with some ~numbers~ (incomes) with the following
Landscape <- tbl_df(read.table("Landscape_Cleaned.txt", header=TRUE, stringsAsFactors = FALSE))
and that is great, but the incomes and all other ~number~ values are being set as class integers. This is causing my headaches later - because I would really like all numbers to be read in as numeric.
I know I an use as.number on each column in turn... but wondering if there is a way, like the stringAsFactors=FALSE, to just get the orginial read in statement to see all numbers as numerics and not integers?
Upvotes: 1
Views: 344
Reputation: 5663
As several folks have already commented, use colClasses
to specify the type of each column explicitly:
read.table('foo.txt', header=TRUE, colClasses=c('character', 'numeric', 'numeric', 'logical', ...))
You can also give names to the colClasses
vector that match the names in the header, in which case you don't have to specify the type of every column; then, each column that you leave out of colClasses
just gets read with the default type.
As an aside, I tend to force 'character' mode for each column whenever I read a data set and then cast the types explicitly afterwards, so that I can put comments explaining why I'm casting each column the way I am.
df = read.csv('foo.csv', header=TRUE, colClasses=c('character'))
## Cast first column as POSIX timestamp. Cast all 'Size' columns to double
## because R doesn't support int64. Cast all 'num' columns to int because
## they don't exceed int32 limits.
df$timestamp = as.POSIXct(df$timestamp, format='%Y%m%d_%H:%M:%S')
sizeColumnIdxs <- grep('Size', names(df))
df[ , sizeColumnIdxs] = as.double(df[ , sizeColumnIdxs])
countColumnIdxs <- grep('num', names(df))
df[ , countColumnIdxs] = as.integer(df[ , countColumnIdxs])
Upvotes: 1