Reputation: 13113
I have data sets in which a hyphen is used in place of the number zero, as in the example data set my.data
below. I can replace the hyphens with zeroes, but then have trouble converting the affected columns to numeric. My actual data sets are very large with many columns and I do not know which columns will contain hyphens. The data sets are also too large and complex for me to feel comfortable using Find and Replace within the data sets themselves before reading them into R.
I think the first three columns of the actual data sets will be character and the rest of the columns should be numeric (if it were not for the hyphens). Is there an efficient and general way to convert all of the columns with hyphens to numeric without knowing which columns those are?
I present one way below, but it seems rather cumbersome.
I have found many similar posts here, but they seem generally to be asking how to replace missing observations with something else or how to convert a specific known factor column to character or numeric format. I have not found any posts that deal with this specific problem, where the specific columns requiring conversion are unknown, although I could have overlooked them. Thank you for any advice.
my.data <- read.table(text = "
landuse units grade Clay Lincoln Basin McCartney Maple
apple acres AAA 1 - 3 4 6
apple acres AA 1000 900 NA NA 700
pear acres AA 10.0 20 NA 30.0 -
peach acres AAA 500 NA 350 300 200
", sep = "", header = TRUE, stringsAsFactors = FALSE, na.string=c('NA'))
my.data
str(my.data)
my.data[my.data == '-'] = '0'
as.numeric(my.data[,4:dim(my.data)[2]])
# Error: (list) object cannot be coerced to type 'double'
# The two lines below work but are too specific
# my.data$Lincoln <- as.numeric(my.data$Lincoln)
# my.data$Maple <- as.numeric(my.data$Maple)
str(my.data)
# Here I unlist the columns I want to be numeric,
# convert them to a numeric matrix and then create a data frame.
# But this seems cumbersome.
un.my.data <- unlist(my.data[,4: dim(my.data)[2]])
un.my.data <- as.numeric(un.my.data)
my.data.2 <- matrix(un.my.data, nrow=dim(my.data)[1], byrow=F)
colnames(my.data.2) <- names(my.data)[4:dim(my.data)[2]]
new.data <- data.frame(my.data[,1:3], my.data.2)
new.data
str(new.data)
Upvotes: 3
Views: 2376
Reputation: 81693
Both conversion of hyphens and transformation to numeric can be done with one command using sapply
and gsub
:
cbind(my.data[ , 1:3],
sapply(my.data[ , -(1:3)], function(x) as.numeric(gsub("^-$", "0", x))))
landuse units grade Clay Lincoln Basin McCartney Maple
1 apple acres AAA 1 0 3 4 6
2 apple acres AA 1000 900 NA NA 700
3 pear acres AA 10 20 NA 30 0
4 peach acres AAA 500 NA 350 300 200
Upvotes: 4
Reputation: 179448
Use a regular expression to replace -
with 0
, then convert to numeric. Wrap all of this in lapply
:
my.data[-(1:3)] <- lapply(
my.data[-(1:3)],
function(x)as.numeric(gsub("-", 0, x))
)
my.data
landuse units grade Clay Lincoln Basin McCartney Maple
1 apple acres AAA 1 0 3 4 6
2 apple acres AA 1000 900 NA NA 700
3 pear acres AA 10 20 NA 30 0
4 peach acres AAA 500 NA 350 300 200
Upvotes: 3
Reputation: 8267
Instead of
as.numeric(my.data[,4:dim(my.data)[2]])
try this:
my.data[,-(1:3)] <-
matrix(as.numeric(as.matrix(my.data[,-(1:3))),nrow=nrow(my.data))
str(my.data)
Note that I took "all but the first three columns" with [,-(1:3)]
, which is slightly more legible than [,4: dim(my.data)[2]]
.
Upvotes: 0