Parseltongue
Parseltongue

Reputation: 11667

R: Create a column of averages based upon groups of four rows

>head(df)
      person   week target actual drop_out  organization agency
1:       QJ1    1     30     19    TRUE       BB           LLC
2:       GJ2    1     30     18    FALSE      BB           LLC
3:       LJ3    1     30     22    TRUE       CC           BBR
4:       MJ4    1     30     24    FALSE      CC           BBR
5:       PJ5    1     35     55    FALSE      AA           FUN
6:       EJ6    1     35     50    FALSE      AA           FUN

There are around ~30 weeks in the dataset with a repeating Person ID each week.

I want to look at each person's values FOUR weeks at a time (so week 1-4, 5-9, 10-13, and so on). For each of these chunks, I want to add up all the "actual" columns and divide it by the sum of the "target" columns. Then we could put that value in a column called "monthly percent."

As per Shape's recommendation I've created a month column like so

fullReshapedDT$month <- with(fullReshapedDT, ceiling(week/4))

Trying to figure out how to iterate over the month column and calculate averages now. Trying something like this, but it obviously doesn't work:

fullReshapedDT[,.(monthly_attendance = actual/target,by=.(person_id, month)]

Upvotes: 0

Views: 47

Answers (1)

Pierre L
Pierre L

Reputation: 28441

Have you tried creating a group variable? It will allow you to group operations by the four-week period:

setDT(df1)[,grps:=ceiling(week/4)      #Create 4-week groups 
           ][,sum(actual)/sum(target), .(person, grps)  #grouped operations
             ][,grps:=NULL][]     #Remove unnecessary columns
#     person        V1
# 1:     QJ1 1.1076923
# 2:     GJ2 1.1128205
# 3:     LJ3 0.9948718
# 4:     MJ4 0.6333333
# 5:     PJ5 1.2410256
# 6:     EJ6 1.0263158
# 7:     QJ1 1.2108108
# 8:     GJ2 0.6378378
# 9:     LJ3 0.9891892
# 10:    MJ4 0.8564103
# 11:    PJ5 1.1729730
# 12:    EJ6 0.8666667

Upvotes: 1

Related Questions