Jain
Jain

Reputation: 999

Sorting and aggregating in R

I used the aggregate function in R to bring down my data entries from 90k to 1800.

a=test$ID
b=test$Date
c=test$Value
d=test$Value1
sumA=aggregate(c, by=list(Date=b,Id=a), FUN=sum)
sumB=aggregate(d, by=list(Date=b,Id=a), FUN=sum)
final[1]=sumA[1],final[2]=sumA[2]
final[3]=sumA[3]/sumB[3]

Now I have data in 20 different dates in a month with close to 90 different ids each day so its around 1800 entries in the final table .

My question is that I want to aggregate further down and find the maximum value of final[3] for each date so that I am just left with 20 values .

In simple terms - There are 20 days . Each day has 90 values for 90 ids I want to find maximum of these 90 values for each day . So at last I would be left with just 20 values for 20 days .

Now aggregate function is not working here with function 'max' instead of sum

Date    ID    Value   Value1 
1        A      20      10  
1        A      25      5  
1        B      50      5  
1        B      50      5
1        C      25      25 
1        C      35      5  
2        A      30      10  
2        A      25      45  
2        B      40      10   
2        B      40      30  

This is the Data

Now By using Aggregate function I got final table as

Date    ID   x
1       A    45/15=3
1       B    100/10=10
1       c    60/30=2
2       A    55/55=1
2       B    80/40=2

Now I want maximum value for date 1 and 2 thats it

Date   max- Value
1      10
2       2

Upvotes: 0

Views: 1386

Answers (2)

Shambho
Shambho

Reputation: 3280

This is a one step process using data table. The data.table is an evolved version of data.frame, and works really well. It has the class of data.frame, so works just like data.frame.

Step0: Converting data.frame to data.table:

library(data.table)
setDT(test)
setkey(test,Date,ID)

Step1: Do the computation

test[,sum(Value)/sum(Value1),by=key(test)][,max(V1),by=Date]

Here the explanation of the step: The first part creates what you call the final table in your question:

test[,sum(Value)/sum(Value1),by=key(test)]
#    Date ID V1
# 1:    1  A  3
# 2:    1  B 10
# 3:    1  C  2
# 4:    2  A  1
# 5:    2  B  2

Now this is passed to the second item to do the max function by Date:

test[,sum(Value)/sum(Value1),by=key(test)][,max(V1),by=Date]
#    Date V1
# 1:    1 10
# 2:    2  2

Hope this helps. It's a very well documented package. You should read more about it.

Upvotes: 2

akrun
akrun

Reputation: 887148

May be this helps.

test <- structure(list(Date = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L
), ID = c("A", "A", "B", "B", "C", "C", "A", "A", "B", "B"), 
Value = c(20L, 25L, 50L, 50L, 25L, 35L, 30L, 25L, 40L, 40L
), Value1 = c(10L, 5L, 5L, 5L, 25L, 5L, 10L, 45L, 10L, 30L
)), .Names = c("Date", "ID", "Value", "Value1"), class = "data.frame", row.names = c(NA, 
-10L))


 res1 <- aggregate(. ~ID+Date, data=test, FUN=sum)
 res1 <- transform(res1, x=Value/Value1)
 res1
 #  ID Date Value Value1  x
 #1  A    1    45     15  3
 #2  B    1   100     10 10
 #3  C    1    60     30  2
 #4  A    2    55     55  1
 #5  B    2    80     40  2

 aggregate(. ~Date, data=res1[,-c(1,3:4)], FUN=max)
#   Date  x
# 1    1 10
# 2    2  2
  • First I run the aggregate based on two grouping variables (ID and Date) on the two value column by using. ~`
  • Created a new variable x i.e. Value/Value1 with transform
  • Did the final run of aggregate with one grouping variable (Date) and removed the rest of the variables except x.

Upvotes: 0

Related Questions