Wendy
Wendy

Reputation: 103

New Variable that counts based on row values in R

I want to create a new column that counts the number of rows that meet a value.

Creating replicable data:

data <- tibble(Category = c("A", "B", "A", "A", "A"))

I want the data to eventually look like this code, but instead of just creating the variable manually like this, I create a new variable CountA using a conditional mutate() or something similar that counts the total number of rows where the value of Category is A only:

tibble(Category = c("A", "B", "A", "A", "A"), CountA = c(4,4,4,4,4))

I know that I could filter out the non-A values and then generate the CountA variable, but I need to keep those rows still for a different purpose.

Upvotes: 0

Views: 977

Answers (1)

AndrewGB
AndrewGB

Reputation: 16876

You can create a logical in mutate, then sum the number of TRUE observations.

library(dplyr)

data %>% 
  mutate(countA = sum(Category == "A", na.rm = TRUE))

Or in base R:

data$countA <- sum(data$Category == "A", na.rm = TRUE)

Output

  Category countA
  <chr>     <int>
1 A             4
2 B             4
3 A             4
4 A             4
5 A             4

If you are wanting to create a new column for every Category, then you could do something like this:

library(tidyverse)

data %>%
  group_by(Category) %>%
  mutate(obs = n(),
         grp = Category,
         row = row_number()) %>%
  pivot_wider(names_from = "grp", values_from = "obs", names_prefix = "Count") %>% 
  ungroup %>% 
  select(-row) %>% 
  fill(-"Category", .direction = "updown")

Output

  Category CountA CountB
  <chr>     <int>  <int>
1 A             4      1
2 B             4      1
3 A             4      1
4 A             4      1
5 A             4      1

Upvotes: 1

Related Questions