user311020193
user311020193

Reputation: 357

Get read in statement in R to consider numbers as numeric class?

In R I am pulling in a data set with some ~numbers~ (incomes) with the following

Landscape <- tbl_df(read.table("Landscape_Cleaned.txt", header=TRUE, stringsAsFactors = FALSE))

and that is great, but the incomes and all other ~number~ values are being set as class integers. This is causing my headaches later - because I would really like all numbers to be read in as numeric.

I know I an use as.number on each column in turn... but wondering if there is a way, like the stringAsFactors=FALSE, to just get the orginial read in statement to see all numbers as numerics and not integers?

Upvotes: 1

Views: 344

Answers (1)

dg99
dg99

Reputation: 5663

As several folks have already commented, use colClasses to specify the type of each column explicitly:

read.table('foo.txt', header=TRUE, colClasses=c('character', 'numeric', 'numeric', 'logical', ...))

You can also give names to the colClasses vector that match the names in the header, in which case you don't have to specify the type of every column; then, each column that you leave out of colClasses just gets read with the default type.

As an aside, I tend to force 'character' mode for each column whenever I read a data set and then cast the types explicitly afterwards, so that I can put comments explaining why I'm casting each column the way I am.

df = read.csv('foo.csv', header=TRUE, colClasses=c('character'))

## Cast first column as POSIX timestamp.  Cast all 'Size' columns to double
## because R doesn't support int64.  Cast all 'num' columns to int because
## they don't exceed int32 limits.
df$timestamp = as.POSIXct(df$timestamp, format='%Y%m%d_%H:%M:%S')

sizeColumnIdxs <- grep('Size', names(df))
df[ , sizeColumnIdxs] = as.double(df[ , sizeColumnIdxs])

countColumnIdxs <- grep('num', names(df))
df[ , countColumnIdxs] = as.integer(df[ , countColumnIdxs])

Upvotes: 1

Related Questions