Reputation: 17651
I have an R data frame with the following format:
column1 column2
NA NA
1 A
1 A
1 A
NA NA
NA NA
2 B
2 B
NA NA
NA NA
3 A
3 A
3 A
df = structure(list(column1 = c(NA, 1L, 1L, 1L, NA, NA, 2L, 2L, NA,
NA, 3L, 3L, 3L), column2 = c(NA, "A", "A", "A", NA, NA, "B",
"B", NA, NA, "A", "A", "A")), .Names = c("column1", "column2"
), row.names = c(NA, -13L), class = "data.frame")
If the row in one column has an NA
, the other column has an NA
.
The numerical value in column1
describes a unique group, e.g. rows 2-4 have the group 1
. The column column2
describes the identity of this grouping. In this data frame, the identity is either A
, B
, C
, or D
.
My goal is to tally the number of identities by group within the entire data frame: how many A groups there are, how many B groups, etc.
The correct output for this file (so far) is there are 2 A groups and 1 B group.
How would I calculate this?
At the moment, I would try something like this:
length(df[df$column2 == "B"]) ## outputs 2
but this is incorrect. If I combined column1
and column2
, took only unique values 1A, 2B, 3A, I guess I could count how many times each label from column2
occurs?
(If it's easier, I'm happy to use data.table
for this task.)
Upvotes: 2
Views: 4030
Reputation: 1114
If you want to use data.table:
library(data.table)
setDT(df)
d <- df[!is.na(column1), list(n=.N), by=list(column2,column1)]
d <- d[, list(n=.N), by=list(column2)]
d
column2 n
1: A 2
2: B 1
Or more concisely as a one-liner:
setDT(df)[!is.na(column1), .N, by = .(column2, column1)][, .N, by = column2]
Upvotes: 4
Reputation: 66819
You can use rle
for runs and table
for tabulation:
table(rle(df$column2)$values)
# A B
# 2 1
See ?rle
and ?table
for details.
Or, if you want to take advantage of column1
(which is derived from column2
):
table(unique(df)$column2)
Upvotes: 5
Reputation: 4216
The 'dplyr' package has simple functions for this
library(dplyr)
df %>%
filter(complete.cases(.) & !duplicated(.)) %>%
group_by(column2) %>%
summarize(count = n())
Upvotes: 4