user2113499
user2113499

Reputation: 1011

function for dividing data frame into segments in R

I am working with a large data set that has a time column and a wind speed column. I need to find a way to divide the data frame into smaller segments based off the time comlumn. if my data frame is

hrmin    wind
1100     x1
1100     x2
1100     x3
1101     x4
1101     x5
1101     x6
1102     x7
1102     x8
1102     x9
1103     x10
1103     x11
1103     x12

I need a function to divide it into smaller segments then output those segments. If I wanted to divide it into two segments then my result is

df1
1100     x1
1100     x2
1100     x3
1101     x4
1101     x5
1101     x6

df2
1102     x7
1102     x8
1102     x9
1103     x10
1103     x11
1103     x12

If I need to output four data frames then I would have

df1
1100     x1
1100     x2
1100     x3

df2
1101     x4
1101     x5
1101     x6

df3
1102     x7
1102     x8
1102     x9

df4
1103     x10
1103     x11
1103     x12

I imagine I would need a function that incorporates split() and subset() but I am not sure how to build it. I am thinking something along the lines of

function( full data frame,number of segments I need) {

split(full data frame, subset(time segments))
return(appropriate amount of smaller data frames)

}

Is there a way to do this or perhaps something better than making a function? I have found ways that show the smaller data frames but I ideally would like them returned with a name like df1, df2, df3... so I can work on them individually after they have output

Upvotes: 0

Views: 817

Answers (1)

Frank
Frank

Reputation: 66819

This is very similar to @akrun's answer (maybe currently deleted):

library(data.table)
setDT(DT)

DT[, g := .GRP, by=hrmin]
split(DT, findInterval(
  DT$g, 
  seq(1, uniqueN(DT$hrmin), length.out = n + 1), 
  rightmost.closed = TRUE 
))

It splits up the groups in order, simply based on the number of groups (and ignoring the number of rows in each group). You can vary n to see how it works. It's straightforward to put this into a function. It's also not hard to do this without data.table; it is simply used here for its nice shortcuts:

  • uniqueN(DT$hrmin) is the number of values for the grouping variable.
  • .GRP, by=hrmin is an ID for the grouping variable, counting 1..uniqueN(DT$hrmin).

Upvotes: 1

Related Questions