David.L
David.L

Reputation: 1

R Counting occurrences by conditions and putting it in a new dataframe

I have a pretty complicated data frame with may variables so I created a simplified one for the purposes of this question:

sample  deployment  density replicate   shell.num   
  A       A           L         1          1         
  A       A           L         1          1         
  A       A           L         1          2         
  A       A           L         2          1         
  A       A           L         2          2         
  B       B           L         1          1         
  B       B           L         1          2         
  B       B           L         2          1         
  B       B           L         2          2         
  A       C           M         1          2        
  A       C           M         1          2       
  A       C           M         1          3        
  A       C           M         2          1         
  A       C           M         2          1         

What I want to do is generate a count of the number of rows that match every variable and then export that count to a new data frame. I want the new data frame to match the structure of this one such that all variables are listed. For example: For sample = A, deployment = A, density = L, replicate = 1, shell.num = 1 there should be a count of 2 that is exported with that classification.

Essentially the new table would look like this:

sample  deployment  density replicate   shell.num   count
  A         A          L        1            1        2
  A         A          L        1            2        1
  A         A          L        2            1        1
  A         A          L        2            2        1
  B         B          L        1            1        1

and so on....

I know there are simple functions like nrow(), but I don't want to have to do this for every possible variable combination. In my real data frame I have thousands of rows of data I would have to go through, and it would defeat the purpose. Is there a simpler way to organize this into a new data frame so that I can export it as a csv?

Upvotes: 0

Views: 702

Answers (2)

Maurits Evers
Maurits Evers

Reputation: 50678

Here is solution in base R using table and duplicated:

row <- apply(df, 1, paste0, collapse = "_");
df2 <- cbind.data.frame(
    df[!duplicated(row), ],
    count = as.numeric(table(row)));
df2;
#   sample deployment density replicate shell.num count
#1       A          A       L         1         1     2
#3       A          A       L         1         2     1
#4       A          A       L         2         1     1
#5       A          A       L         2         2     1
#6       B          B       L         1         1     2
#7       B          B       L         1         2     1
#8       B          B       L         2         1     2
#9       B          B       L         2         2     1
#10      A          C       M         1         2     1
#12      A          C       M         1         3     1
#13      A          C       M         2         1     1

Sample data

df <- read.table(text =
    "sample  deployment  density replicate   shell.num
  A       A           L         1          1
  A       A           L         1          1
  A       A           L         1          2
  A       A           L         2          1
  A       A           L         2          2
  B       B           L         1          1
  B       B           L         1          2
  B       B           L         2          1
  B       B           L         2          2
  A       C           M         1          2
  A       C           M         1          2
  A       C           M         1          3
  A       C           M         2          1
  A       C           M         2          1", header = T)

Upvotes: 2

Calum You
Calum You

Reputation: 15072

Here is a simple tidyverse solution. Basically what you want to do is use group_by on your variables of interest, and then use summarise(n()) to get the count in each group. The count function here is just a wrapper for this operation.

library(tidyverse)
tbl <- read_table2(
  "sample  deployment  density replicate   shell.num
A       A           L         1          1
A       A           L         1          1
A       A           L         1          2
A       A           L         2          1
A       A           L         2          2
B       B           L         1          1
B       B           L         1          2
B       B           L         2          1
B       B           L         2          2
A       C           M         1          2
A       C           M         1          2
A       C           M         1          3
A       C           M         2          1
A       C           M         2          1"
)

tbl %>%
  count(sample, deployment, density, replicate, shell.num)
#> # A tibble: 11 x 6
#>    sample deployment density replicate shell.num     n
#>    <chr>  <chr>      <chr>       <int>     <int> <int>
#>  1 A      A          L               1         1     2
#>  2 A      A          L               1         2     1
#>  3 A      A          L               2         1     1
#>  4 A      A          L               2         2     1
#>  5 A      C          M               1         2     2
#>  6 A      C          M               1         3     1
#>  7 A      C          M               2         1     2
#>  8 B      B          L               1         1     1
#>  9 B      B          L               1         2     1
#> 10 B      B          L               2         1     1
#> 11 B      B          L               2         2     1

Created on 2018-02-26 by the reprex package (v0.2.0).

Upvotes: 1

Related Questions