Reputation: 33
I have a dataframe with an unknown amount of columns (it can change frequently) and I need to count the number of observations for a given ID and year for every column and create a costum "n" column for each column of my dataframe telling me how many observations were made for that specific column.
I have tried:
library(dplyr)
count <- tally(group_by(final_database,ID,Year))
But that will count unique combinations of ID + Year. While I need to know how many times over the years my ID was observed for each characteristic. Example:
ID Year CHAR1 n_CHAR1
A 2016 0 3
A 2017 5 3
A 2018 2 3
A 2019 3
B 2016 1 2
B 2017 2
B 2018 2
B 2019 1 2
And so on for all characteristics. I would insert the "n_CHAR" columns to the original dataframe.
It doesn't need to be tidy. Thanks!
Upvotes: 0
Views: 105
Reputation: 14764
Try:
transform(final_database, n_CHAR1 = ave(CHAR1, ID, FUN = function(x) sum(x != "")))
If the blank rows are actually NA
, then just replace sum(x != "")
with sum(!is.na(x))
.
Edit:
If you'd need multiple n
columns for multiple NCHAR
columns, you could do:
library(dplyr)
final_database %>%
group_by(ID) %>%
mutate_at(vars(starts_with("CHAR")),
list(n = ~ sum(. != "")))
This example assumes that all the relevant NCHAR
columns start with the string NCHAR
(e.g. NCHAR1
, NCHAR2
, NCHAR3
, etc.).
If the columns you're referring to are 3rd to last, then you can do:
library(dplyr)
finalDatabase <- final_database %>%
group_by(ID) %>%
mutate_at(vars(3:ncol(.)), # If you don't have many other vars except NCHAR, you can also do vars(-ID, -Year) as suggested by @camille
list(n = ~ sum(. != ""))) %>%
select(ID, Year, ends_with("_n"))
Upvotes: 3
Reputation: 18661
We can also do this with data.table
:
library(data.table)
setDT(df)[, n_CHAR1 := sum(CHAR1 != ""), by = "ID"]
Output:
ID Year CHAR1 n_CHAR1
1: A 2016 0 3
2: A 2017 5 3
3: A 2018 2 3
4: A 2019 3
5: B 2016 1 2
6: B 2017 2
7: B 2018 2
8: B 2019 1 2
Data:
df <- structure(list(ID = c("A", "A", "A", "A", "B", "B", "B", "B"),
Year = c(2016L, 2017L, 2018L, 2019L, 2016L, 2017L, 2018L,
2019L), CHAR1 = c("0", "5", "2", "", "1", "", "", "1")), row.names = c(NA,
-8L), class = "data.frame")
Upvotes: 0