Mag
Mag

Reputation: 11

Binary database into a frequency table

I am using R to write a report for a class, and I have a pretty big binary database (1 and NA) to indicate presence or absence.

`# A tibble: 149 × 31
    Vide Copé. Ca…¹ Copé.…² Copé.…³ Copé.…⁴ Polyc…⁵ Néréi…⁶ Pecti…⁷ Crang…⁸ Mysid…⁹
   <dbl>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1     0          0       0       0       0       0       0       0       0       0
 2     0          0       0       0       0       0       1       0       0       0
 3     0          0       0       0       0       0       1       0       0       0
 4     0          0       0       0       0       0       0       0       0       1
 5     0          0       0       0       0       0       1       0       0       0
 6     0          0       0       0       0       0       0       0       0       0
 7     0          0       0       0       0       0       1       0       0       0
 8     0          0       0       0       0       0       1       0       0       0
 9     0          0       0       0       0       0       0       0       0       0
10     0          0       0       0       0       0       0       0       0       0
# … with 139 more rows, 21 more variables: `Carides sp.` <dbl>, Amphipodes <dbl>,
#   `Pandalidés(crevette nordique)` <dbl>, Cumacés <dbl>, Isopodes <dbl>,
#   `Crustacés sp.` <dbl>, Éperlan...17 <dbl>, Capucette <dbl>,
#   `Épinoche sp.` <dbl>, `Poisson sp.` <dbl>, Gastéropode <dbl>, Bivalve <dbl>,
#   `Poulamon Atlantique` <dbl>, `Éperlan arc-en-ciel` <dbl>, Éperlan...25 <dbl>,
#   HARENG <dbl>, OSMÉRIDÉ <dbl>, Moronidé <dbl>, `Bar rayé` <dbl>, Baret <dbl>,
#   `Alose savoureuse` <dbl>, and abbreviated variable names ¹​`Copé. Cala.`, …
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names`

I need to represent the frequency of presence for each category :

           Frequency
Vide           0
Copépodes      2
Néréidés sp.   5
etc.

Is there a way for me to do this without recreating a database from scratch? I can't seem to find how online... It's my first time posting a question here, and I'm quite new with R, so I'm not sure how I could fix this.

Upvotes: 1

Views: 73

Answers (3)

Elin
Elin

Reputation: 6755

You could also use skimr.

skimr::skim(yourdata)

Will give you a lot of summary statistics for all your variables, including number missing and complete and the sum with na.rm = TRUE.

You could also use the output data frame if you want to further modify it.

Upvotes: -1

r2evans
r2evans

Reputation: 160397

Sample data:

set.seed(42)
dat <- as.data.frame(lapply(setNames(nm=letters[1:5]), function(z) sample(0:1, 10, replace=TRUE)))
dat
#    a b c d e
# 1  0 0 0 1 0
# 2  0 1 0 1 0
# 3  0 0 0 1 1
# 4  0 1 0 1 1
# 5  1 0 0 0 1
# 6  1 0 1 1 1
# 7  1 1 0 0 1
# 8  1 1 0 1 1
# 9  0 1 0 1 1
# 10 1 1 0 1 0

Straight-forward code:

stack(sapply(dat, sum))
#   values ind
# 1      5   a
# 2      6   b
# 3      1   c
# 4      8   d
# 5      7   e

Thanks @Friede, colSums is clearly better than sapply(dat, sum), not sure why I missed that...

stack(colSums(dat))

Upvotes: 3

GuedesBF
GuedesBF

Reputation: 9858

If we are using the tidyverse, we can summarise, (and pivot_longer if needed):

library(dplyr)
library(tidyr)

dat |> 
    summarise(across(everything(), \(x) sum(x, na.rm = TRUE))) |> 
    pivot_longer(everything(), values_to = "Frequency")

with @r2evans' data:

# A tibble: 5 × 2
  name  Frequency
  <chr>     <int>
1 a             5
2 b             6
3 c             1
4 d             8
5 e             7

Upvotes: 2

Related Questions