Sri Sreshtan
Sri Sreshtan

Reputation: 41

While converting character variable into integer, there is a message saying : NAs introduced by coercion. How to avoid this error?

I have tried to convert the character variable into integer variable using as.integer function. However, when the code is executed, the output returns the values as NA. The code is as follows,

library(tidyverse)
coal_data <- read.csv("http://594442.youcanlearnit.net/coal.csv", skip = 2)
coal_data %>% glimpse()
colnames(coal_data)[1] <- "region"
coal_long <- gather(coal_data, 'year', 'coal_consumption', -region)
coal_long %>% glimpse()
coal_long %>% separate(year, into = c("x", "year"), sep = "X")%>%
    select(-x)%>% glimpse()
class(coal_long$year)
coal_long$year <- as.integer(coal_long$year)

The output was as follows

coal_long %>% glimpse()



 Rows: 6,960
    Columns: 3
    $ region           <fct> "North America", "Bermuda", "Canada", "Greenland", "Mexico",...
    $ year             <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
    $ coal_consumption <chr> "16.45179", "0", "0.96156", "0.00005", "0.10239", "0", "15.3...

Actual output expected is getting the year in integer form. Many thanks in advance for looking into this.

Upvotes: 0

Views: 539

Answers (3)

Chuck P
Chuck P

Reputation: 3923

May as well make coal_consumption a double while you're at it...

library(tidyverse)

coal_data <- read.csv("http://594442.youcanlearnit.net/coal.csv", skip = 2, na.strings = "--")

colnames(coal_data)[1] <- "region"
coal_long <- gather(coal_data, 'year', 'coal_consumption', -region)
coal_long %>% glimpse()
#> Rows: 6,960
#> Columns: 3
#> $ region           <chr> "North America", "Bermuda", "Canada", "Greenland", "…
#> $ year             <chr> "X1980", "X1980", "X1980", "X1980", "X1980", "X1980"…
#> $ coal_consumption <dbl> 16.45179, 0.00000, 0.96156, 0.00005, 0.10239, 0.0000…
coal_long <- coal_long %>% separate(year, into = c("x", "year"), sep = "X") %>%
  select(-x) %>% glimpse()
#> Rows: 6,960
#> Columns: 3
#> $ region           <chr> "North America", "Bermuda", "Canada", "Greenland", "…
#> $ year             <chr> "1980", "1980", "1980", "1980", "1980", "1980", "198…
#> $ coal_consumption <dbl> 16.45179, 0.00000, 0.96156, 0.00005, 0.10239, 0.0000…
class(coal_long$year)
#> [1] "character"
coal_long$year <- as.integer(str_remove(coal_long$year, "X"))
glimpse(coal_long)
#> Rows: 6,960
#> Columns: 3
#> $ region           <chr> "North America", "Bermuda", "Canada", "Greenland", "…
#> $ year             <int> 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980…
#> $ coal_consumption <dbl> 16.45179, 0.00000, 0.96156, 0.00005, 0.10239, 0.0000…

Upvotes: 3

thorepet
thorepet

Reputation: 461

You need to re-assign the coal_long after removing the X in the year column.

coal_long <- coal_long %>% 
  separate(year, into = c("x", "year"), sep = "X") %>% 
  select(-x) %>% 
  glimpse()

coal_long$year <- as.integer(coal_long$year)

coal_long %>% glimpse()

Rows: 6,960
Columns: 3
$ region           <fct> "North America", "Bermuda", "Canada", "Greenland", "Mexico", "Saint Pierre and Miquelon", "United States", "Cent…
$ year             <int> 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980…
$ coal_consumption <chr> "16.45179", "0", "0.96156", "0.00005", "0.10239", "0", "15.38779", "0.42011", "0", "0", "0.03476", "--", "0", "0…

Upvotes: 2

Jeff Bezos
Jeff Bezos

Reputation: 2253

You need to remove the letters from coal_long$year before you convert to an integer. Try something like this.

coal_long$year # X1980 X1981 X1982 X1983, etc.
as.integer(str_remove(coal_long$year, "X"))

Here's a more generic approach that extracts all digits from the string before converting.

as.integer(str_extract(coal_long$year, "\\d+"))

Upvotes: 1

Related Questions