Mark Miller
Mark Miller

Reputation: 13113

convert hyphen to zero in data frame then convert affected columns to numeric in R

I have data sets in which a hyphen is used in place of the number zero, as in the example data set my.data below. I can replace the hyphens with zeroes, but then have trouble converting the affected columns to numeric. My actual data sets are very large with many columns and I do not know which columns will contain hyphens. The data sets are also too large and complex for me to feel comfortable using Find and Replace within the data sets themselves before reading them into R.

I think the first three columns of the actual data sets will be character and the rest of the columns should be numeric (if it were not for the hyphens). Is there an efficient and general way to convert all of the columns with hyphens to numeric without knowing which columns those are?

I present one way below, but it seems rather cumbersome.

I have found many similar posts here, but they seem generally to be asking how to replace missing observations with something else or how to convert a specific known factor column to character or numeric format. I have not found any posts that deal with this specific problem, where the specific columns requiring conversion are unknown, although I could have overlooked them. Thank you for any advice.

my.data <- read.table(text = "
landuse units grade Clay    Lincoln    Basin     McCartney     Maple
apple   acres AAA     1         -          3             4         6
apple   acres AA   1000       900         NA            NA       700
pear    acres AA   10.0        20         NA          30.0         -
peach   acres AAA   500        NA        350           300       200
", sep = "", header = TRUE, stringsAsFactors = FALSE, na.string=c('NA'))

my.data
str(my.data)

my.data[my.data == '-'] = '0'

as.numeric(my.data[,4:dim(my.data)[2]])

# Error: (list) object cannot be coerced to type 'double'

# The two lines below work but are too specific
# my.data$Lincoln <- as.numeric(my.data$Lincoln)
# my.data$Maple   <- as.numeric(my.data$Maple)

str(my.data)

# Here I unlist the columns I want to be numeric,
# convert them to a numeric matrix and then create a data frame.
# But this seems cumbersome.

un.my.data <- unlist(my.data[,4: dim(my.data)[2]])
un.my.data <- as.numeric(un.my.data)

my.data.2 <- matrix(un.my.data, nrow=dim(my.data)[1], byrow=F)
colnames(my.data.2) <- names(my.data)[4:dim(my.data)[2]]

new.data <- data.frame(my.data[,1:3], my.data.2)
new.data
str(new.data)

Upvotes: 3

Views: 2376

Answers (3)

Sven Hohenstein
Sven Hohenstein

Reputation: 81693

Both conversion of hyphens and transformation to numeric can be done with one command using sapply and gsub:

cbind(my.data[ , 1:3],
      sapply(my.data[ , -(1:3)], function(x) as.numeric(gsub("^-$", "0", x))))


  landuse units grade Clay Lincoln Basin McCartney Maple
1   apple acres   AAA    1       0     3         4     6
2   apple acres    AA 1000     900    NA        NA   700
3    pear acres    AA   10      20    NA        30     0
4   peach acres   AAA  500      NA   350       300   200

Upvotes: 4

Andrie
Andrie

Reputation: 179448

Use a regular expression to replace - with 0, then convert to numeric. Wrap all of this in lapply:

my.data[-(1:3)] <- lapply(
  my.data[-(1:3)], 
  function(x)as.numeric(gsub("-", 0, x))
)

my.data
  landuse units grade Clay Lincoln Basin McCartney Maple
1   apple acres   AAA    1       0     3         4     6
2   apple acres    AA 1000     900    NA        NA   700
3    pear acres    AA   10      20    NA        30     0
4   peach acres   AAA  500      NA   350       300   200

Upvotes: 3

Stephan Kolassa
Stephan Kolassa

Reputation: 8267

Instead of

as.numeric(my.data[,4:dim(my.data)[2]])

try this:

my.data[,-(1:3)] <-
  matrix(as.numeric(as.matrix(my.data[,-(1:3))),nrow=nrow(my.data))
str(my.data)

Note that I took "all but the first three columns" with [,-(1:3)], which is slightly more legible than [,4: dim(my.data)[2]].

Upvotes: 0

Related Questions