123blee
123blee

Reputation: 109

Summary statistics needed when NAs are present in a data frame's common rows

I have a data frame that may or may not contain NAs. If NAs are present, they are present in identical rows. My included code for columns when NAs are not present works as desired for calculation of summary statistics. Presence of NAs produce NAs for the summary stats.

I have have a feeling this solution will somehow figure into my solution , though my attempts so far are not successful. One of my columns is of character format ('a' in this example) and requires bypassing with is.numeric.

R: summarise a dataframe with NAN in columns summarise(across(.fns = na.omit))


fimber <- tibble(a = c("1", "2", "3", "4", "5"),
                 b = c(8, 9, 10, NA, NA),
                 c = c(10, 15, 20, NA, NA),
                 d = c(50, 60, 70, NA, NA),
                 e = c(80, 90, 100, NA, NA)
)

fimber

# A tibble: 5 × 5
#   a         b     c     d     e
#   <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1         8    10    50    80
# 2 2         9    15    60    90
# 3 3        10    20    70   100
# 4 4        NA    NA    NA    NA
# 5 5        NA    NA    NA    NA



# Works fine with no NAs

fimber %>% add_row(a = "Min", round( summarise(., across(where(is.numeric), min  )), 2) )
fimber %>% add_row(a = "Max", round( summarise(., across(where(is.numeric), max  )), 2) )
fimber %>% add_row(a = "Mean", round( summarise(., across(where(is.numeric), mean  )), 2) )
fimber %>% add_row(a = "Median", round( summarise(., across(where(is.numeric), median  )), 2) )
fimber %>% add_row(a = "St. Dev.", round( summarise(., across(where(is.numeric), sd  )), 2) )


fimber %>% add_row(a = "Median", round( summarise( across(.fns = na.omit, where(is.numeric)), median  ), 2) )




Upvotes: 1

Views: 86

Answers (2)

Jeni
Jeni

Reputation: 968

By using the symbol ~, in .fns argument, you can customize your desired function:

fimber %>% add_row(a = "Min", round( summarise(., across(where(is.numeric), .~min(.x, na.rm = T)  )), 2) )
fimber %>% add_row(a = "Max", round( summarise(., across(where(is.numeric), .~max(.x, na.rm=T)  )), 2) )
fimber %>% add_row(a = "Mean", round( summarise(., across(where(is.numeric), ~mean(.x, na.rm=T)  )), 2) )
fimber %>% add_row(a = "Median", round( summarise(., across(where(is.numeric), ~median(.x, na.rm=T)  )), 2) )
fimber %>% add_row(a = "St. Dev.", round( summarise(., across(where(is.numeric), ~sd(.x, na.rm=T)  )), 2) )

These commands lead me to the following output:

# A tibble: 6 x 5
  a         b     c     d     e
  <chr> <dbl> <dbl> <dbl> <dbl>
1 1         8    10    50    80
2 2         9    15    60    90
3 3        10    20    70   100
4 4        NA    NA    NA    NA
5 5        NA    NA    NA    NA
6 Min       8    10    50    80

# A tibble: 6 x 5
  a         b     c     d     e
  <chr> <dbl> <dbl> <dbl> <dbl>
1 1         8    10    50    80
2 2         9    15    60    90
3 3        10    20    70   100
4 4        NA    NA    NA    NA
5 5        NA    NA    NA    NA
6 Max      10    20    70   100

# A tibble: 6 x 5
  a         b     c     d     e
  <chr> <dbl> <dbl> <dbl> <dbl>
1 1         8    10    50    80
2 2         9    15    60    90
3 3        10    20    70   100
4 4        NA    NA    NA    NA
5 5        NA    NA    NA    NA
6 Mean      9    15    60    90

# A tibble: 6 x 5
  a          b     c     d     e
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 1          8    10    50    80
2 2          9    15    60    90
3 3         10    20    70   100
4 4         NA    NA    NA    NA
5 5         NA    NA    NA    NA
6 Median     9    15    60    90

# A tibble: 6 x 5
  a            b     c     d     e
  <chr>    <dbl> <dbl> <dbl> <dbl>
1 1            8    10    50    80
2 2            9    15    60    90
3 3           10    20    70   100
4 4           NA    NA    NA    NA
5 5           NA    NA    NA    NA
6 St. Dev.     1     5    10    10

Upvotes: 1

Mohamed Desouky
Mohamed Desouky

Reputation: 4425

Try this

fimber %>% add_row(a = "Min", round( summarise(., across(where(is.numeric), min  , na.rm = TRUE)), 2) )
fimber %>% add_row(a = "Max", round( summarise(., across(where(is.numeric), max  , na.rm = TRUE)), 2) )
fimber %>% add_row(a = "Mean", round( summarise(., across(where(is.numeric), mean  , na.rm = TRUE)), 2) )
fimber %>% add_row(a = "Median", round( summarise(., across(where(is.numeric), median  , na.rm = TRUE)), 2) )
fimber %>% add_row(a = "St. Dev.", round( summarise(., across(where(is.numeric), sd  , na.rm = TRUE)), 2) )

Upvotes: 0

Related Questions