Reputation: 103
I want to create a new column that counts the number of rows that meet a value.
Creating replicable data:
data <- tibble(Category = c("A", "B", "A", "A", "A"))
I want the data to eventually look like this code, but instead of just creating the variable manually like this, I create a new variable CountA
using a conditional mutate()
or something similar that counts the total number of rows where the value of Category
is A only:
tibble(Category = c("A", "B", "A", "A", "A"), CountA = c(4,4,4,4,4))
I know that I could filter out the non-A values and then generate the CountA variable, but I need to keep those rows still for a different purpose.
Upvotes: 0
Views: 977
Reputation: 16876
You can create a logical in mutate
, then sum
the number of TRUE
observations.
library(dplyr)
data %>%
mutate(countA = sum(Category == "A", na.rm = TRUE))
Or in base R:
data$countA <- sum(data$Category == "A", na.rm = TRUE)
Output
Category countA
<chr> <int>
1 A 4
2 B 4
3 A 4
4 A 4
5 A 4
If you are wanting to create a new column for every Category
, then you could do something like this:
library(tidyverse)
data %>%
group_by(Category) %>%
mutate(obs = n(),
grp = Category,
row = row_number()) %>%
pivot_wider(names_from = "grp", values_from = "obs", names_prefix = "Count") %>%
ungroup %>%
select(-row) %>%
fill(-"Category", .direction = "updown")
Output
Category CountA CountB
<chr> <int> <int>
1 A 4 1
2 B 4 1
3 A 4 1
4 A 4 1
5 A 4 1
Upvotes: 1