PanemForever
PanemForever

Reputation: 23

sumifs and countifs equivalent in R

guys

Say I have a dataset that looks like this:

id   value   year   total_value  total_frequency
1      10    2019       44             4
1      15    2019       44             4
1      12    2020       44             4
1      7     2020       44             4
2      10    2019       28             2
2      18    2020       28             2
3      27    2019       27             1

I intend to achieve a form like this:

id   value   year   total_value  total_frequency  2020_value  2020_frequency
1      10    2019       44             4              19           2
1      15    2019       44             4              19           2
1      12    2020       44             4              19           2
1      7     2020       44             4              19           2
2      10    2019       28             2              18           1
2      18    2020       28             2              18           1
3      27    2019       27             1               0           0

The repetition is necessary for future steps.

I can easily achieve this form via Excel using sumifs() and countifs(), but unable to get the same result in R.

I think I can use dplyr package to do this but don't know exactly how, hence some advice is needed.

Upvotes: 2

Views: 1433

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 101383

A base R option using tapply

transform(
  df,
  value_2020 = tapply(value, list(year, id), sum)["2020", ][id],
  frequency_2020 = tapply(year == 2020, list(year, id), sum)["2020", ][id]
)

gives

  id value year total_value total_frequency value_2020 frequency_2020
1  1    10 2019          44               4         19              2
2  1    15 2019          44               4         19              2
3  1    12 2020          44               4         19              2
4  1     7 2020          44               4         19              2
5  2    10 2019          28               2         18              1
6  2    18 2020          28               2         18              1
7  3    27 2019          27               1         NA             NA

Upvotes: 0

akrun
akrun

Reputation: 887128

Using data.table

library(data.table)
setDT(df1)[, c('value_2020', 'frequency_2020') := {i1 <- year %in% 2020
            .(sum(value[i1]), sum(i1))}, id]

-output

 df1
   id value year total_value total_frequency value_2020 frequency_2020
1:  1    10 2019          44               4         19              2
2:  1    15 2019          44               4         19              2
3:  1    12 2020          44               4         19              2
4:  1     7 2020          44               4         19              2
5:  2    10 2019          28               2         18              1
6:  2    18 2020          28               2         18              1
7:  3    27 2019          27               1          0              0

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388982

Group by id and sum the value for the year in 2020 and count the number of rows for it as well.

library(dplyr)

df %>%
  group_by(id) %>%
  mutate(value_2020 = sum(value[year %in% 2020]), 
          frequency_2020 = sum(year %in% 2020)) %>%
  ungroup

#     id value  year total_value total_frequency value_2020 frequency_2020
#  <int> <int> <int>       <int>           <int>      <int>          <int>
#1     1    10  2019          44               4         19              2
#2     1    15  2019          44               4         19              2
#3     1    12  2020          44               4         19              2
#4     1     7  2020          44               4         19              2
#5     2    10  2019          28               2         18              1
#6     2    18  2020          28               2         18              1
#7     3    27  2019          27               1          0              0

Upvotes: 4

Related Questions