T-T
T-T

Reputation: 713

R: Applying gsub to data frames returns NAs

I am trying to convert a data frame that contains numbers and blanks to numeric. Currently, numbers are in factor format and some have ",".

df <- data.frame(num1 = c("123,456,789", "1,234,567", "1,234", ""), num2 = c("","1,012","","202"))
df
         num1  num2
1 123,456,789      
2   1,234,567 1,012
3       1,234      
4               202

Remove "," and convert to numeric format:

df2 = as.numeric(gsub(",","",df))
Warning message:
NAs introduced by coercion

Interestingly, if I perform the same function column by column, it worked:

df$num1 = as.numeric(gsub(",","",df$num1)) 
df$num2 = as.numeric(gsub(",","",df$num2))
df
             num1  num2
    1   123456789    NA
    2     1234567  1012
    3        1234    NA
    4          NA   202

My questions are 1. What is the cause and if there is a way to avoid converting them column by column since the actual data frame has lots more columns; and 2. What would be the best way to remove NAs or replace them by 0s for future numeric operations? I know I can use gsub to do so but just wondering if there is a better way.

Upvotes: 1

Views: 964

Answers (1)

akrun
akrun

Reputation: 887571

We can use replace_na after replace the , with '' (str_replace_all)

library(dplyr)
library(stringr)
df %>% 
   mutate_all(list(~ str_replace_all(., ",", "") %>% 
                        as.numeric %>%
                        replace_na(0)))
#       num1 num2
#1 123456789    0
#2   1234567 1012
#3      1234    0
#4         0  202

The issue with gsub/sub is that it works on vector as described in the ?gsub

x, text - a character vector where matches are sought, or an object which can be coerced by as.character to a character vector. Long vectors are supported.

We can loop over the columns, apply the gsub, and assign the output back to the original dataset

df[] <- lapply(df, function(x) as.numeric(gsub(",", "", x))) 
df[is.na(df)] <- 0 # change the NA elements to 0

Upvotes: 1

Related Questions