Ed_Gravy
Ed_Gravy

Reputation: 2033

character to numeric conversion problem in R

I have a big time series dataset in which the numeric results are stored in General format in MS-Excel. I tried using gsub(",", "", dummy ), but it did not work. The dataset does not have any , or any other visible special character other than a decimal point, and R picks up the datatype as character. Values are either positive or negative with one NA and all values have different number of decimal places.

How can I convert without having to deal with N/As after converting to numeric. One thing to note though is that when converted to numeric, some of the values are displayed in scientific notation like 12.1 e+03 and other values with four decimal places.

dummy = c("12.1", "42000", "1.2145", "12.25", N/A, "323.369", "-1.235", "335", "0")

# Convert to numeric   
dummy = gsub(",", "", dummy ) 
dummy = as.numeric(dummy )

Error

Warning message:
NAs introduced by coercion "

Upvotes: 1

Views: 1948

Answers (1)

Alexander Christensen
Alexander Christensen

Reputation: 403

Changing N/A to NA solves this issue:

# N/A to NA
dummy = c("12.1", "42000", "1.2145", "12.25", NA, "323.369", "-1.235", "335")

# Convert to numeric
dummy = gsub(",", "", dummy) 
dummy = as.numeric(dummy)

To do so for your entire dataset, you can use:

# Across columns (for matrices)
data <- apply(data, 2, function(x){
ifelse(x == "N/A", NA, x)
})

# Then convert characters to numeric (for matrices)
data <- apply(data, 2, as.numeric)

# Across columns (for data frames)
data <- lapply(data, function(x){
ifelse(x == "N/A", NA, x)
})

# Then convert characters to numeric (for data frames)
data <- lapply(data, as.numeric)

Update: *apply differences for object types in R -- thanks to user20650 for pointing this out

Upvotes: 2

Related Questions