Count number of occurences for every column in dataframe

Question

I have a dataframe with an unknown amount of columns (it can change frequently) and I need to count the number of observations for a given ID and year for every column and create a costum "n" column for each column of my dataframe telling me how many observations were made for that specific column.

I have tried:

library(dplyr)
count <- tally(group_by(final_database,ID,Year))

But that will count unique combinations of ID + Year. While I need to know how many times over the years my ID was observed for each characteristic. Example:

ID  Year    CHAR1   n_CHAR1
A   2016    0       3   
A   2017    5       3
A   2018    2       3
A   2019            3
B   2016    1       2
B   2017            2
B   2018            2
B   2019    1       2

And so on for all characteristics. I would insert the "n_CHAR" columns to the original dataframe.

It doesn't need to be tidy. Thanks!

arg0naut91 · Accepted Answer

Try:

transform(final_database, n_CHAR1 = ave(CHAR1, ID, FUN = function(x) sum(x != "")))

If the blank rows are actually NA, then just replace sum(x != "") with sum(!is.na(x)).

Edit:

If you'd need multiple n columns for multiple NCHAR columns, you could do:

library(dplyr)

final_database %>%
  group_by(ID) %>%
  mutate_at(vars(starts_with("CHAR")),
            list(n = ~ sum(. != "")))

This example assumes that all the relevant NCHAR columns start with the string NCHAR (e.g. NCHAR1, NCHAR2, NCHAR3, etc.).

If the columns you're referring to are 3rd to last, then you can do:

library(dplyr)

finalDatabase <- final_database %>%
  group_by(ID) %>%
  mutate_at(vars(3:ncol(.)), # If you don't have many other vars except NCHAR, you can also do vars(-ID, -Year) as suggested by @camille
            list(n = ~ sum(. != ""))) %>%
  select(ID, Year, ends_with("_n"))

Count number of occurences for every column in dataframe

Answers (2)

Related Questions