rightclickscript
rightclickscript

Reputation: 63

Complex subset of a data.frame

I have a data frame with close to a million objects in it. I need an efficient to way to subset the data based on multiple criteria. I can do this is a for loop but was wondering if there is a more elegant way to do this.

Time    Instance    Server  Metric  Value
17/08/2014 04:00:00 PM  ID1 Server888   disk.commandsaveraged.average   0
17/08/2014 04:00:00 PM  ID1 Server999   disk.commandsaveraged.average   0
17/08/2014 04:00:00 PM  ID1 Server777   disk.commandsaveraged.average   0
17/08/2014 04:05:00 PM  ID1 Server888   disk.commandsaveraged.average   0
17/08/2014 04:05:00 PM  ID1 Server999   disk.commandsaveraged.average   0
17/08/2014 04:05:00 PM  ID1 Server777   disk.commandsaveraged.average   0
17/08/2014 04:00:00 PM  ID2 Server888   disk.commandsaveraged.average   0
17/08/2014 04:05:00 PM  ID2 Server888   disk.commandsaveraged.average   0
17/08/2014 04:00:00 PM  ID3 Server999   disk.commandsaveraged.average   0
17/08/2014 04:05:00 PM  ID3 Server999   disk.commandsaveraged.average   0
17/08/2014 04:00:00 PM  ID3 Server777   disk.commandsaveraged.average   0
17/08/2014 04:05:00 PM  ID3 Server777   disk.commandsaveraged.average   0
17/08/2014 04:00:00 PM  ID1 Server888   disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM  ID1 Server999   disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM  ID1 Server777   disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM  ID1 Server888   disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM  ID1 Server999   disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM  ID1 Server777   disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM  ID2 Server888   disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM  ID2 Server888   disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM  ID3 Server999   disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM  ID3 Server999   disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM  ID3 Server777   disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM  ID3 Server777   disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM  ID1 Server888   disk.numberwriteaveraged.average    0
17/08/2014 04:00:00 PM  ID7 Server999   disk.numberwriteaveraged.average    0
17/08/2014 04:00:00 PM  ID1 Server777   disk.numberwriteaveraged.average    0
17/08/2014 04:05:00 PM  ID1 Server888   disk.numberwriteaveraged.average    0
17/08/2014 04:05:00 PM  ID1 Server999   disk.numberwriteaveraged.average    0
17/08/2014 04:05:00 PM  ID7 Server777   disk.numberwriteaveraged.average    0
17/08/2014 04:00:00 PM  ID2 Server888   disk.numberwriteaveraged.average    0
17/08/2014 04:05:00 PM  ID5 Server888   disk.numberwriteaveraged.average    0
17/08/2014 04:00:00 PM  ID3 Server999   disk.numberwriteaveraged.average    0
17/08/2014 04:05:00 PM  ID4 Server999   disk.numberwriteaveraged.average    0
17/08/2014 04:00:00 PM  ID3 Server777   disk.numberwriteaveraged.average    0
17/08/2014 04:05:00 PM  ID3 Server777   disk.numberwriteaveraged.average    0

What I want to do is create a subset where metric == disk.numberwriteaveraged.average , Server == Server999 & Server == Server888 AND WHERE both servers have the same instance ID's in common.

NOTE, I use the term subset purely because I don't know of any other way to filter data i R, still learning. I am looking for speed and I will be generating data sets much larger than my current one.

Upvotes: 0

Views: 147

Answers (1)

David Arenburg
David Arenburg

Reputation: 92302

(If I understand your question correctly) In your case, data.table is your friend. Try (assuming df is your data set):

library(data.table)
df2 <- setDT(df)[, .SD[Metric == "disk.commandsaveraged.average" & 
            (Server == "Server999" | Server == "Server888")], by = Instance]

Upvotes: 2

Related Questions