Reputation: 23
guys
Say I have a dataset that looks like this:
id value year total_value total_frequency
1 10 2019 44 4
1 15 2019 44 4
1 12 2020 44 4
1 7 2020 44 4
2 10 2019 28 2
2 18 2020 28 2
3 27 2019 27 1
I intend to achieve a form like this:
id value year total_value total_frequency 2020_value 2020_frequency
1 10 2019 44 4 19 2
1 15 2019 44 4 19 2
1 12 2020 44 4 19 2
1 7 2020 44 4 19 2
2 10 2019 28 2 18 1
2 18 2020 28 2 18 1
3 27 2019 27 1 0 0
The repetition is necessary for future steps.
I can easily achieve this form via Excel using sumifs() and countifs(), but unable to get the same result in R.
I think I can use dplyr
package to do this but don't know exactly how, hence some advice is needed.
Upvotes: 2
Views: 1433
Reputation: 101383
A base R option using tapply
transform(
df,
value_2020 = tapply(value, list(year, id), sum)["2020", ][id],
frequency_2020 = tapply(year == 2020, list(year, id), sum)["2020", ][id]
)
gives
id value year total_value total_frequency value_2020 frequency_2020
1 1 10 2019 44 4 19 2
2 1 15 2019 44 4 19 2
3 1 12 2020 44 4 19 2
4 1 7 2020 44 4 19 2
5 2 10 2019 28 2 18 1
6 2 18 2020 28 2 18 1
7 3 27 2019 27 1 NA NA
Upvotes: 0
Reputation: 887128
Using data.table
library(data.table)
setDT(df1)[, c('value_2020', 'frequency_2020') := {i1 <- year %in% 2020
.(sum(value[i1]), sum(i1))}, id]
-output
df1
id value year total_value total_frequency value_2020 frequency_2020
1: 1 10 2019 44 4 19 2
2: 1 15 2019 44 4 19 2
3: 1 12 2020 44 4 19 2
4: 1 7 2020 44 4 19 2
5: 2 10 2019 28 2 18 1
6: 2 18 2020 28 2 18 1
7: 3 27 2019 27 1 0 0
Upvotes: 0
Reputation: 388982
Group by id
and sum
the value
for the year
in 2020 and count the number of rows for it as well.
library(dplyr)
df %>%
group_by(id) %>%
mutate(value_2020 = sum(value[year %in% 2020]),
frequency_2020 = sum(year %in% 2020)) %>%
ungroup
# id value year total_value total_frequency value_2020 frequency_2020
# <int> <int> <int> <int> <int> <int> <int>
#1 1 10 2019 44 4 19 2
#2 1 15 2019 44 4 19 2
#3 1 12 2020 44 4 19 2
#4 1 7 2020 44 4 19 2
#5 2 10 2019 28 2 18 1
#6 2 18 2020 28 2 18 1
#7 3 27 2019 27 1 0 0
Upvotes: 4