KNW
KNW

Reputation: 243

Number of significant digits in dplyr summarise

I am having trouble getting the desired number of decimal places from summarise. Here is a simple example:

test2  <- data.frame(c("a","a","b","b"), c(245,246,247,248))
library(dplyr)
colnames(test2)  <- c("V1","V2")
group_by(test2,V1) %>% summarise(mean(V2))

The dataframe is:

  V1  V2
1  a 245
2  a 246
3  b 247
4  b 248

The output is:

 V1     `mean(V2)`
 <fctr>      <dbl>
1 a             246
2 b             248

I would like it to give me the means including the decimal place (i.e. 245.5 and 247.5)

Upvotes: 23

Views: 36637

Answers (3)

Indrajeet Patil
Indrajeet Patil

Reputation: 4879

This is one solution-

test2  <- data.frame(c("a", "a", "b", "b"), c(245, 246, 247, 248))
library(dplyr)
colnames(test2)  <- c("V1", "V2")
group_by(test2, V1) %>% 
  dplyr::summarise(mean(V2)) %>% 
  dplyr::mutate_if(is.numeric, format, 1)
#> # A tibble: 2 x 2
#>   V1    `mean(V2)`
#>   <fct> <chr>     
#> 1 a     245.5     
#> 2 b     247.5

Created on 2018-01-20 by the reprex package (v0.1.1.9000).

EDIT :

If you want to keep it as numeric :

test2  <- data.frame(c("a", "a", "b", "b"), c(245, 246, 247, 248))
library(dplyr)
colnames(test2)  <- c("V1", "V2")
group_by(test2, V1) %>% 
  dplyr::summarise(mean(V2)) %>% 
  as.data.frame(.) %>% 
  dplyr::mutate_if(is.numeric, round, 1)

Gives

  V1 mean(V2)
1  a    245.5
2  b    247.5

And with another example (from @Matifou) :

tab <- tibble(x = c(0.1234, 1.1234, 10.1234, 100.1234, 1000.1234))

tab %>%  
  as.data.frame(.) %>% 
  dplyr::mutate_if(is.numeric, round, 2)

Gives :

        x
1    0.12
2    1.12
3   10.12
4  100.12
5 1000.12

Upvotes: 9

Matifou
Matifou

Reputation: 8880

Because you are using dplyr tools, the resulting output is actually a tibble, which by default prints numbers with 3 significant digits (see option pillar.sigfig). This is not the same as number of digits after the period. To obtain the latter, convert it simply to a data.frame: as.data.frame

Note that tibble's concept of significant digits is somehow complicated, and does not indicate how many digits after the period are represented, but the minimum number of digits necessary to have a given accurate representation of the number (I think 99.9%, see discussion here).

This means the number of digits printed depends on the "size" of your number:

library(tibble)
packageVersion("tibble")
#> [1] '2.1.3'
packageVersion("pillar")
#> [1] '1.4.2'
tab <- tibble(x = c(0.1234, 1.1234, 10.1234, 100.1234, 1000.1234))

options(pillar.sigfig=3)
tab
#> # A tibble: 5 x 1
#>          x
#>      <dbl>
#> 1    0.123
#> 2    1.12 
#> 3   10.1  
#> 4  100.   
#> 5 1000.

options(pillar.sigfig=4)
tab
#> # A tibble: 5 x 1
#>           x
#>       <dbl>
#> 1    0.1234
#> 2    1.123 
#> 3   10.12  
#> 4  100.1   
#> 5 1000.

as.data.frame(tab)
#>           x
#> 1    0.1234
#> 2    1.1234
#> 3   10.1234
#> 4  100.1234
#> 5 1000.1234

Created on 2019-08-21 by the reprex package (v0.3.0)

Upvotes: 18

Rafael D&#237;az
Rafael D&#237;az

Reputation: 2289

I think the simplest solution is the following:

test2  <- data.frame(c("a","a","b","b"), c(245,246,247,248))
library(dplyr)
colnames(test2)  <- c("V1","V2")
group_by(test2,V1) %>% summarise(`mean(V2)` = sprintf("%0.1f",mean(V2)))
# A tibble: 2 x 2
  V1    `mean(V2)`
  <fct> <chr>     
1 a     245.5     
2 b     247.5     

Upvotes: 1

Related Questions