Helena
Helena

Reputation: 87

summarise function in R

I am trying to create a R database including some numerical variable. While doing this, I made a typing mistake whose result looks weird to me and I would like to understand why (for sure I am missing something, here).

I have tried to look around for possible explanation but haven' t found what I am looking for.

library("dplyr")
library("tidyr")

 data <- 
  data.frame(FS = c(1), FS_name = c("Armenia"), Year = c(2015), class = 
  c("class190"), area_1000ha = c(66.447)) %>% 
  mutate(FS_name = as.character(FS_name)) %>%
  mutate(Year = as.integer(Year)) %>%
  mutate(class = as.character(class)) %>%
  tbl_df()

data

x <-  data %>% 
  group_by(FS, FS_name, Year, class) %>%
  dplyr::summarise(area_1000ha = sum(area_1000ha, rm.na = TRUE)) %>% 
  ungroup()

As you can see, the mistake is rm.na= rather than na.rm= When I type correctly, I have the right result on area_1000ha variable (10.5). If I don't - i.e. keeping rm.na= I get 11.5, instead (+1, in fact). What am I missing?

Upvotes: 0

Views: 320

Answers (2)

Harshal Gajare
Harshal Gajare

Reputation: 615

There is no function in R as rm.na hence R is considering it as a variable which has value TRUE i.e. 1.

Try keeping it na.rm = T and you will get the right result.

Even if you change the name of the variable

x <-  data %>% 
  group_by(FS, FS_name, Year, class) %>%
  dplyr::summarise(area_1000ha = sum(area_1000ha, tester = TRUE)) %>% 
  ungroup()

I have replaced rm.na with tester variable.

# A tibble: 1 x 4
  FS_name  Year class    area_1000ha
  <chr>   <int> <chr>          <dbl>
1 Rome     2018 class190        11.5

Upvotes: 1

fmarm
fmarm

Reputation: 4284

I think rm.na=TRUE is added to the sum, and as TRUE is considered as 1, it sums your initial sum and 1. If you change TRUE to 2 for example

x <- data %>% 
  group_by(FS_name, Year, class) %>%
  dplyr::summarise(area_1000ha = sum(area_1000ha, rm.na = 2)) %>% 
  ungroup()

The result is

# A tibble: 1 x 4
  FS_name  Year class    area_1000ha
  <chr>   <int> <chr>          <dbl>
1 Rome     2018 class190        12.5

Upvotes: 4

Related Questions