Huskir
Huskir

Reputation: 33

Count number of occurences for every column in dataframe

I have a dataframe with an unknown amount of columns (it can change frequently) and I need to count the number of observations for a given ID and year for every column and create a costum "n" column for each column of my dataframe telling me how many observations were made for that specific column.

I have tried:

library(dplyr)
count <- tally(group_by(final_database,ID,Year))

But that will count unique combinations of ID + Year. While I need to know how many times over the years my ID was observed for each characteristic. Example:

ID  Year    CHAR1   n_CHAR1
A   2016    0       3   
A   2017    5       3
A   2018    2       3
A   2019            3
B   2016    1       2
B   2017            2
B   2018            2
B   2019    1       2

And so on for all characteristics. I would insert the "n_CHAR" columns to the original dataframe.

It doesn't need to be tidy. Thanks!

Upvotes: 0

Views: 105

Answers (2)

arg0naut91
arg0naut91

Reputation: 14764

Try:

transform(final_database, n_CHAR1 = ave(CHAR1, ID, FUN = function(x) sum(x != "")))

If the blank rows are actually NA, then just replace sum(x != "") with sum(!is.na(x)).

Edit:

If you'd need multiple n columns for multiple NCHAR columns, you could do:

library(dplyr)

final_database %>%
  group_by(ID) %>%
  mutate_at(vars(starts_with("CHAR")),
            list(n = ~ sum(. != "")))

This example assumes that all the relevant NCHAR columns start with the string NCHAR (e.g. NCHAR1, NCHAR2, NCHAR3, etc.).

If the columns you're referring to are 3rd to last, then you can do:

library(dplyr)

finalDatabase <- final_database %>%
  group_by(ID) %>%
  mutate_at(vars(3:ncol(.)), # If you don't have many other vars except NCHAR, you can also do vars(-ID, -Year) as suggested by @camille
            list(n = ~ sum(. != ""))) %>%
  select(ID, Year, ends_with("_n"))

Upvotes: 3

acylam
acylam

Reputation: 18661

We can also do this with data.table:

library(data.table)

setDT(df)[, n_CHAR1 := sum(CHAR1 != ""), by = "ID"]

Output:

   ID Year CHAR1 n_CHAR1
1:  A 2016     0       3
2:  A 2017     5       3
3:  A 2018     2       3
4:  A 2019             3
5:  B 2016     1       2
6:  B 2017             2
7:  B 2018             2
8:  B 2019     1       2

Data:

df <- structure(list(ID = c("A", "A", "A", "A", "B", "B", "B", "B"), 
    Year = c(2016L, 2017L, 2018L, 2019L, 2016L, 2017L, 2018L, 
    2019L), CHAR1 = c("0", "5", "2", "", "1", "", "", "1")), row.names = c(NA, 
-8L), class = "data.frame")

Upvotes: 0

Related Questions