Reputation: 1
I have a pretty complicated data frame with may variables so I created a simplified one for the purposes of this question:
sample deployment density replicate shell.num
A A L 1 1
A A L 1 1
A A L 1 2
A A L 2 1
A A L 2 2
B B L 1 1
B B L 1 2
B B L 2 1
B B L 2 2
A C M 1 2
A C M 1 2
A C M 1 3
A C M 2 1
A C M 2 1
What I want to do is generate a count of the number of rows that match every variable and then export that count to a new data frame. I want the new data frame to match the structure of this one such that all variables are listed. For example: For sample = A, deployment = A, density = L, replicate = 1, shell.num = 1 there should be a count of 2 that is exported with that classification.
Essentially the new table would look like this:
sample deployment density replicate shell.num count
A A L 1 1 2
A A L 1 2 1
A A L 2 1 1
A A L 2 2 1
B B L 1 1 1
and so on....
I know there are simple functions like nrow()
, but I don't want to have to do this for every possible variable combination. In my real data frame I have thousands of rows of data I would have to go through, and it would defeat the purpose. Is there a simpler way to organize this into a new data frame so that I can export it as a csv?
Upvotes: 0
Views: 702
Reputation: 50678
Here is solution in base R using table
and duplicated
:
row <- apply(df, 1, paste0, collapse = "_");
df2 <- cbind.data.frame(
df[!duplicated(row), ],
count = as.numeric(table(row)));
df2;
# sample deployment density replicate shell.num count
#1 A A L 1 1 2
#3 A A L 1 2 1
#4 A A L 2 1 1
#5 A A L 2 2 1
#6 B B L 1 1 2
#7 B B L 1 2 1
#8 B B L 2 1 2
#9 B B L 2 2 1
#10 A C M 1 2 1
#12 A C M 1 3 1
#13 A C M 2 1 1
df <- read.table(text =
"sample deployment density replicate shell.num
A A L 1 1
A A L 1 1
A A L 1 2
A A L 2 1
A A L 2 2
B B L 1 1
B B L 1 2
B B L 2 1
B B L 2 2
A C M 1 2
A C M 1 2
A C M 1 3
A C M 2 1
A C M 2 1", header = T)
Upvotes: 2
Reputation: 15072
Here is a simple tidyverse
solution. Basically what you want to do is use group_by
on your variables of interest, and then use summarise(n())
to get the count in each group. The count
function here is just a wrapper for this operation.
library(tidyverse)
tbl <- read_table2(
"sample deployment density replicate shell.num
A A L 1 1
A A L 1 1
A A L 1 2
A A L 2 1
A A L 2 2
B B L 1 1
B B L 1 2
B B L 2 1
B B L 2 2
A C M 1 2
A C M 1 2
A C M 1 3
A C M 2 1
A C M 2 1"
)
tbl %>%
count(sample, deployment, density, replicate, shell.num)
#> # A tibble: 11 x 6
#> sample deployment density replicate shell.num n
#> <chr> <chr> <chr> <int> <int> <int>
#> 1 A A L 1 1 2
#> 2 A A L 1 2 1
#> 3 A A L 2 1 1
#> 4 A A L 2 2 1
#> 5 A C M 1 2 2
#> 6 A C M 1 3 1
#> 7 A C M 2 1 2
#> 8 B B L 1 1 1
#> 9 B B L 1 2 1
#> 10 B B L 2 1 1
#> 11 B B L 2 2 1
Created on 2018-02-26 by the reprex package (v0.2.0).
Upvotes: 1