Reputation: 35
I need to split a dataframe into 17,872 dataframes based on a header-row reoccuring in the dataframe. I need to store the newly created dataframes in a list.
My dataframe looks like:
0 1 2
. . . .
. . . .
. . . .
. . . .
32 Alert Type Response
33 w1 x1 y1
34 w2 x2 y2
. . . .
. . . .
. . . .
. . . .
144 Alert Type Response
145 a1 b1 c1
146 a2 b2 c2
I want to create a new dataframe every time that the row containing "Alert, Type, Response" appears.
I have a hack-y way of getting the outcome - I create vectors containing the start and end row values to determine where each dataframe should start and stop and then use lapply.
list_data <- lapply(1:length(start_rows),
function(x) data[start_rows[x]:end_rows[x],])
This works, however, I am looking for a way to do this without having to determine the row value vectors, as I have 50 other dataframes that also need to be split into 17,000+ smaller dataframes.
Upvotes: 2
Views: 722
Reputation: 1382
You are probably looking for the split
function. I made a small example where I split every time the b
column is equal to a
(d<-data.frame(a=1:10, b=sample(letters[1:3], replace = T, size = 10)))
#> a b
#> 1 1 a
#> 2 2 a
#> 3 3 c
#> 4 4 b
#> 5 5 c
#> 6 6 b
#> 7 7 c
#> 8 8 b
#> 9 9 c
#> 10 10 a
d$f<-cumsum(d$b=='a')
lst<-split(d, d$f)
lst
#> $`1`
#> a b f
#> 1 1 a 1
#>
#> $`2`
#> a b f
#> 2 2 a 2
#> 3 3 c 2
#> 4 4 b 2
#> 5 5 c 2
#> 6 6 b 2
#> 7 7 c 2
#> 8 8 b 2
#> 9 9 c 2
#>
#> $`3`
#> a b f
#> 10 10 a 3
Created on 2021-10-05 by the reprex package (v2.0.1)
Upvotes: 3