codeblue
codeblue

Reputation: 63

Special filtering of a dataframe in R

I am attempting to filter the following data frame:

calendar <- data.frame(Account_Num,Account_Num_Spec,Billing_Num,Transaction_Amt2,color,CAL_DATE2)

In the following manner:

For all rows that have the same contents in the "Account_Num_Spec", "Billing_Num" and "color" - filter them into one row that has the sum of "Transaction_Amt2" and the first oldest date in "CAL_DATE2".

For example

    Ac_Num   AC_Num_Spec  Bill_Num   Amount   Color   Date
1    1162     1162-001     620919      10      red    10/12/19 
2    1179     1179-001     790719      10      red    11/12/19
3    1172     1172-001     721119      10      red    12/12/19
4    1162     1162-001     620919      -5      red    09/12/19
5    1162     1162-001     621019      10      red    11/12/19
6    1179     1179-001     791119      10      green  11/12/19
7    1162     1162-002     620919      10      red    12/12/19
8    1179     1179-001     791079      10      green  10/12/19

goes to:

    Ac_Num   AC_Num_Spec  Bill_Num   Amount   Color   Date
1    1162     1162-001     620919       5      red    09/12/19 
2    1179     1179-001     790719      10      red    11/12/19
3    1179     1179-001     791119      10      red    12/12/19
4    1172     1172-001     721119      10      red    12/12/19
5    1162     1162-001     621019      10      red    11/12/19
6    1179     1179-001     791119      10      green  11/12/19
7    1162     1162-002     620919      10      red    12/12/19
8    1179     1179-001     791079      10      green  10/12/19

Upvotes: 1

Views: 27

Answers (1)

Eben Via
Eben Via

Reputation: 73

I believe using group_by and summarize within the dplyr package should work for what you are asking

calendar %>%
    group_by(Account_Num,Account_Num_Spec,Billing_Num, color) %>%
    summarize(Amount = sum(Amount),
              Date = max(Date))

I know you didn't specify that Account_Num had to be matching, but from your example that looked to be the case so I included it within the group_by function.

Upvotes: 1

Related Questions