Reputation: 713
I am trying to convert a data frame that contains numbers and blanks to numeric. Currently, numbers are in factor
format and some have ",".
df <- data.frame(num1 = c("123,456,789", "1,234,567", "1,234", ""), num2 = c("","1,012","","202"))
df
num1 num2
1 123,456,789
2 1,234,567 1,012
3 1,234
4 202
Remove "," and convert to numeric format:
df2 = as.numeric(gsub(",","",df))
Warning message:
NAs introduced by coercion
Interestingly, if I perform the same function column by column, it worked:
df$num1 = as.numeric(gsub(",","",df$num1))
df$num2 = as.numeric(gsub(",","",df$num2))
df
num1 num2
1 123456789 NA
2 1234567 1012
3 1234 NA
4 NA 202
My questions are 1. What is the cause and if there is a way to avoid converting them column by column since the actual data frame has lots more columns; and 2. What would be the best way to remove NAs or replace them by 0s for future numeric operations? I know I can use gsub
to do so but just wondering if there is a better way.
Upvotes: 1
Views: 964
Reputation: 887571
We can use replace_na
after replace the ,
with ''
(str_replace_all
)
library(dplyr)
library(stringr)
df %>%
mutate_all(list(~ str_replace_all(., ",", "") %>%
as.numeric %>%
replace_na(0)))
# num1 num2
#1 123456789 0
#2 1234567 1012
#3 1234 0
#4 0 202
The issue with gsub/sub
is that it works on vector
as described in the ?gsub
x, text - a character vector where matches are sought, or an object which can be coerced by as.character to a character vector. Long vectors are supported.
We can loop over the columns, apply the gsub
, and assign the output back to the original dataset
df[] <- lapply(df, function(x) as.numeric(gsub(",", "", x)))
df[is.na(df)] <- 0 # change the NA elements to 0
Upvotes: 1