Eduardo
Eduardo

Reputation: 4382

Grouping R variables based on sub-groups

I have a data formatted as

PERSON_A PERSON_B MEET LEAVE

That describes basically when a PERSON_A met a PERSON_B at time MEET and they said "bye" to each other at moment LEAVE. The time is expressed in seconds, and there is a small part of the data on http://pastie.org/2825794 (simple.dat).

What I need is to count the number of meetings grouping it by day. At the moment, I have a code that works, the appearance is not beautiful. Anyway, I'd like a help in order to transform it in a code that reflects the grouping Im trying to do, e.g, using ddply, etc. Therefore, my main aim is to learn from this case. Probably there are many mistakes in this code regarding good practices in R.

library(plyr)
data = read.table("simple.dat", stringsAsFactors=FALSE)
names(data)=c('PERSON_A','PERSON_B','MEET','LEAVE')
attach(data)

min_interval = min(MEET)
max_interval = max(LEAVE)
interval = max_interval - min_interval
day = 86400
number_of_days = floor(interval/day)

g = data.frame(MEETINGS=c(0:number_of_days))     # just to store the result
g[,1] = 0

start_offset = min_interval                       # start of the first day
for (interval in c(0:number_of_days)) {
    end_offset = start_offset + day
    meetings = (length(data[data$MEET >= start_offset & data$LEAVE <= end_offset, ]$PERSON_A) + length(data[data$MEET >= start_offset & data$LEAVE <= end_offset, ]$PERSON_B))
    g[interval+1, ] = meetings
    start_offset = end_offset             # start next day
}
g

This code iterates over the days (intervals of 86400 seconds) and stores the number of meetings on the dataframe g. The correct output (shown bellow) of this code when executed on the linked dataset gives for each line (day) the number o meetings.

       MEETINGS
1        38
2        10
3        16
4        18
5        24
6         6
7         4
8        10
9        28
10       14
11       22
12        2
13 .. 44   0         # I simplified the output here
45        2

Anyway, I know that I could use ddply to get the number of meetings for each pair o nodes:

contacts <- ddply(data, .(PERSON_A, PERSON_B), summarise
 , CONTACTS = length(c(PERSON_A, PERSON_B)) /2
)

but there is a huge hill for me between this and the result I need.

As a end note, I read How to make a great R reproducible example? and tried my best :)

Thanks,

Upvotes: 0

Views: 235

Answers (1)

kohske
kohske

Reputation: 66902

try this:

> d2 <- transform(data, m = floor(MEET/86400) + 1, l = floor(LEAVE/86400) + 1)
> d3 <- subset(d2, m == l)
> table(d3$m) * 2

 1  2  3  4  5  6  7  8  9 10 11 12 45 
38 10 16 18 24  6  4 10 28 14 22  2  2 

floor(x/(60*60*24)) is a quick way to convert second into day.

Upvotes: 4

Related Questions