Reputation:
I have a dataframe:
ID SUB_ID Action
1 NA Preparate1
1 NA Preparate2
1 A Open
1 A Download
1 A Close
1 B Open
1 B Search
1 B Download
1 B Close
2 AA Open
2 AA Download
2 AA Close
2 BB Open
2 BB Search
2 BB Filter
2 BB Close
3 C Open
3 C Search
3 C Filter
3 C Close
I want to get table with ID and number of SUB_ID per ID and number of "Download" in column Action. So, desired result is:
ID SUB_ID_n Download_n
1 2 2
2 2 1
3 1 0
How could i do that? As you see there are sometimes rows with NA in SUB_ID. I don't want to count those NA as additional SUB_ID within ID.
I have tried this but its counting NA in SUB_ID:
setDT(df)[, .(SUB_ID_n = uniqueN(SUB_ID),
Download_n = sum(Action == 'Download')), ID]
Upvotes: 0
Views: 44
Reputation: 887118
here, we need to use na.rm = TRUE
as by default is FALSE
library(data.table)
setDT(df)[, .(SUB_ID_n = uniqueN(SUB_ID, na.rm = TRUE),
Download_n = sum(Action == 'Download')), ID]
# ID SUB_ID_n Download_n
#1: 1 2 2
#2: 2 2 1
#3: 3 1 0
Similarly, the n_distinct
from dplyr
also have na.rm
argument
library(dplyr)
df %>%
group_by(ID) %>%
summarise(SUB_ID_n = n_distinct(SUB_ID, na.rm = TRUE),
Download_n = sum(Action == 'Download'), .groups = 'drop')
# A tibble: 3 x 3
# ID SUB_ID_n Download_n
# <int> <int> <int>
#1 1 2 2
#2 2 2 1
#3 3 1 0
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), SUB_ID = c(NA, NA, "A",
"A", "A", "B", "B", "B", "B", "AA", "AA", "AA", "BB", "BB", "BB",
"BB", "C", "C", "C", "C"), Action = c("Preparate1", "Preparate2",
"Open", "Download", "Close", "Open", "Search", "Download", "Close",
"Open", "Download", "Close", "Open", "Search", "Filter", "Close",
"Open", "Search", "Filter", "Close")), class = "data.frame", row.names = c(NA,
-20L))
Upvotes: 0
Reputation: 39595
Try this tidyverse
approach. The key is to subset non NA
values in summarise()
. Here the code:
library(tidyverse)
#Data and code
df %>% group_by(ID) %>%
summarise(N_id=n_distinct(SUB_ID[!is.na(SUB_ID)]),
N_Action=length(Action[Action=='Download']))
Output:
# A tibble: 3 x 3
ID N_id N_Action
<int> <int> <int>
1 1 2 2
2 2 2 1
3 3 1 0
Some data used:
#Data
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), SUB_ID = c(NA, NA, "A",
"A", "A", "B", "B", "B", "B", "AA", "AA", "AA", "BB", "BB", "BB",
"BB", "C", "C", "C", "C"), Action = c("Preparate1", "Preparate2",
"Open", "Download", "Close", "Open", "Search", "Download", "Close",
"Open", "Download", "Close", "Open", "Search", "Filter", "Close",
"Open", "Search", "Filter", "Close")), class = "data.frame", row.names = c(NA,
-20L))
Upvotes: 1