Reputation: 13
I have a large dataframe with 557 columns which i want to split into multiple dataframes of different column lengths. I try to explain below what i would like to achieve with a smaller dataframe.
my dataframe:
> df <- data.frame(row.names = c("x","y","z"),
"a" = c(2844.8,10232.5,20150.6),
"b" = c(1430.9,29263.6,26334.5),
"c" = c(906.2,6019.1,6848.6),
"REG01" = c(1871.0,69618.7,45032.2),
"d" = c(2106.0,29929.6,58626.1),
"e" = c(1818.8,232371.1,42713.6),
"REG02" = c(1364.5,57561.7,20656.4),
"f" = c(520.4,46754.9,9036.9),
"REG03" = c(1821.4,43862.3,51876.1))
> df
a b c REG01 d e REG02 f REG03
x 2844.8 1430.9 906.2 1871.0 2106.0 1818.8 1364.5 520.4 1821.4
y 10232.5 29263.6 6019.1 69618.7 29929.6 232371.1 57561.7 46754.9 43862.3
z 20150.6 26334.5 6848.6 45032.2 58626.1 42713.6 20656.4 9036.9 51876.1
Desired output - a list of 3 dataframes that looks like this:
> df.list[[1]]
a b c REG01
x 2844.8 1430.9 906.2 1871.0
y 10232.5 29263.6 6019.1 69618.7
z 20150.6 26334.5 6848.6 45032.2
> df.list[[2]]
d e REG02
x 2106.0 1818.8 1364.5
y 29929.6 232371.1 57561.7
z 58626.1 42713.6 20656.4
> df.list[[3]]
f REG03
x 520.4 1821.4
y 46754.9 43862.3
z 9036.9 51876.1
I'm really struggling to know where to start as the resultant dataframes will be different sizes, the columns to split at are different names, and given my actual data is much larger (my result would be 44 dataframes) I can't explicitly reference the col names (although they do all start REG and are followed by 2 digits).
Thanks for any suggestions you may have
Upvotes: 1
Views: 561
Reputation: 79188
You could use split.default
split.default(df, c(0, cumsum(grepl("^REG", names(df)[-ncol(df)]))))
$`0`
a b c REG01
x 2844.8 1430.9 906.2 1871.0
y 10232.5 29263.6 6019.1 69618.7
z 20150.6 26334.5 6848.6 45032.2
$`1`
d e REG02
x 2106.0 1818.8 1364.5
y 29929.6 232371.1 57561.7
z 58626.1 42713.6 20656.4
$`2`
f REG03
x 520.4 1821.4
y 46754.9 43862.3
z 9036.9 51876.1
Upvotes: 3
Reputation: 4358
in base-R
lapply(split( as.data.frame(t(df)), cumsum(c(1,grepl("REG",colnames(df))))[1:ncol(df)]),t)
gives
$`1`
a b c REG01
x 2844.8 1430.9 906.2 1871.0
y 10232.5 29263.6 6019.1 69618.7
z 20150.6 26334.5 6848.6 45032.2
$`2`
d e REG02
x 2106.0 1818.8 1364.5
y 29929.6 232371.1 57561.7
z 58626.1 42713.6 20656.4
$`3`
f REG03
x 520.4 1821.4
y 46754.9 43862.3
z 9036.9 51876.1
Upvotes: 1