Reputation: 63
I have a data frame with close to a million objects in it. I need an efficient to way to subset the data based on multiple criteria. I can do this is a for loop but was wondering if there is a more elegant way to do this.
Time Instance Server Metric Value
17/08/2014 04:00:00 PM ID1 Server888 disk.commandsaveraged.average 0
17/08/2014 04:00:00 PM ID1 Server999 disk.commandsaveraged.average 0
17/08/2014 04:00:00 PM ID1 Server777 disk.commandsaveraged.average 0
17/08/2014 04:05:00 PM ID1 Server888 disk.commandsaveraged.average 0
17/08/2014 04:05:00 PM ID1 Server999 disk.commandsaveraged.average 0
17/08/2014 04:05:00 PM ID1 Server777 disk.commandsaveraged.average 0
17/08/2014 04:00:00 PM ID2 Server888 disk.commandsaveraged.average 0
17/08/2014 04:05:00 PM ID2 Server888 disk.commandsaveraged.average 0
17/08/2014 04:00:00 PM ID3 Server999 disk.commandsaveraged.average 0
17/08/2014 04:05:00 PM ID3 Server999 disk.commandsaveraged.average 0
17/08/2014 04:00:00 PM ID3 Server777 disk.commandsaveraged.average 0
17/08/2014 04:05:00 PM ID3 Server777 disk.commandsaveraged.average 0
17/08/2014 04:00:00 PM ID1 Server888 disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM ID1 Server999 disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM ID1 Server777 disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM ID1 Server888 disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM ID1 Server999 disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM ID1 Server777 disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM ID2 Server888 disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM ID2 Server888 disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM ID3 Server999 disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM ID3 Server999 disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM ID3 Server777 disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM ID3 Server777 disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM ID1 Server888 disk.numberwriteaveraged.average 0
17/08/2014 04:00:00 PM ID7 Server999 disk.numberwriteaveraged.average 0
17/08/2014 04:00:00 PM ID1 Server777 disk.numberwriteaveraged.average 0
17/08/2014 04:05:00 PM ID1 Server888 disk.numberwriteaveraged.average 0
17/08/2014 04:05:00 PM ID1 Server999 disk.numberwriteaveraged.average 0
17/08/2014 04:05:00 PM ID7 Server777 disk.numberwriteaveraged.average 0
17/08/2014 04:00:00 PM ID2 Server888 disk.numberwriteaveraged.average 0
17/08/2014 04:05:00 PM ID5 Server888 disk.numberwriteaveraged.average 0
17/08/2014 04:00:00 PM ID3 Server999 disk.numberwriteaveraged.average 0
17/08/2014 04:05:00 PM ID4 Server999 disk.numberwriteaveraged.average 0
17/08/2014 04:00:00 PM ID3 Server777 disk.numberwriteaveraged.average 0
17/08/2014 04:05:00 PM ID3 Server777 disk.numberwriteaveraged.average 0
What I want to do is create a subset where metric == disk.numberwriteaveraged.average
, Server == Server999 & Server == Server888
AND WHERE both servers have the same instance ID's in common.
NOTE, I use the term subset purely because I don't know of any other way to filter data i R, still learning. I am looking for speed and I will be generating data sets much larger than my current one.
Upvotes: 0
Views: 147
Reputation: 92302
(If I understand your question correctly) In your case, data.table
is your friend. Try (assuming df
is your data set):
library(data.table)
df2 <- setDT(df)[, .SD[Metric == "disk.commandsaveraged.average" &
(Server == "Server999" | Server == "Server888")], by = Instance]
Upvotes: 2