Reputation: 1011
I am working with a large data set that has a time column and a wind speed column. I need to find a way to divide the data frame into smaller segments based off the time comlumn. if my data frame is
hrmin wind
1100 x1
1100 x2
1100 x3
1101 x4
1101 x5
1101 x6
1102 x7
1102 x8
1102 x9
1103 x10
1103 x11
1103 x12
I need a function to divide it into smaller segments then output those segments. If I wanted to divide it into two segments then my result is
df1
1100 x1
1100 x2
1100 x3
1101 x4
1101 x5
1101 x6
df2
1102 x7
1102 x8
1102 x9
1103 x10
1103 x11
1103 x12
If I need to output four data frames then I would have
df1
1100 x1
1100 x2
1100 x3
df2
1101 x4
1101 x5
1101 x6
df3
1102 x7
1102 x8
1102 x9
df4
1103 x10
1103 x11
1103 x12
I imagine I would need a function that incorporates split() and subset() but I am not sure how to build it. I am thinking something along the lines of
function( full data frame,number of segments I need) {
split(full data frame, subset(time segments))
return(appropriate amount of smaller data frames)
}
Is there a way to do this or perhaps something better than making a function? I have found ways that show the smaller data frames but I ideally would like them returned with a name like df1, df2, df3... so I can work on them individually after they have output
Upvotes: 0
Views: 817
Reputation: 66819
This is very similar to @akrun's answer (maybe currently deleted):
library(data.table)
setDT(DT)
DT[, g := .GRP, by=hrmin]
split(DT, findInterval(
DT$g,
seq(1, uniqueN(DT$hrmin), length.out = n + 1),
rightmost.closed = TRUE
))
It splits up the groups in order, simply based on the number of groups (and ignoring the number of rows in each group). You can vary n
to see how it works. It's straightforward to put this into a function. It's also not hard to do this without data.table
; it is simply used here for its nice shortcuts:
uniqueN(DT$hrmin)
is the number of values for the grouping variable..GRP, by=hrmin
is an ID for the grouping variable, counting 1..uniqueN(DT$hrmin)
.Upvotes: 1