sgoldie
sgoldie

Reputation: 13

Splitting a dataframe into multiple dataframes based on the column name in R

I have a large dataframe with 557 columns which i want to split into multiple dataframes of different column lengths. I try to explain below what i would like to achieve with a smaller dataframe.

my dataframe:

> df <- data.frame(row.names = c("x","y","z"),
                 "a" = c(2844.8,10232.5,20150.6),
                 "b" = c(1430.9,29263.6,26334.5),
                 "c" = c(906.2,6019.1,6848.6),
                 "REG01" = c(1871.0,69618.7,45032.2),
                 "d" = c(2106.0,29929.6,58626.1),
                 "e" = c(1818.8,232371.1,42713.6),
                 "REG02" = c(1364.5,57561.7,20656.4),
                 "f" = c(520.4,46754.9,9036.9),
                 "REG03" = c(1821.4,43862.3,51876.1))

> df

        a       b       c   REG01       d        e    REG02        f   REG03
x  2844.8  1430.9   906.2  1871.0  2106.0   1818.8   1364.5    520.4  1821.4
y 10232.5 29263.6  6019.1 69618.7 29929.6 232371.1  57561.7  46754.9 43862.3
z 20150.6 26334.5  6848.6 45032.2 58626.1  42713.6  20656.4   9036.9 51876.1

Desired output - a list of 3 dataframes that looks like this:

> df.list[[1]]

        a       b       c   REG01       
x  2844.8  1430.9   906.2  1871.0  
y 10232.5 29263.6  6019.1 69618.7 
z 20150.6 26334.5  6848.6 45032.2 

> df.list[[2]]

         d        e    REG02
x   2106.0   1818.8   1364.5
y  29929.6 232371.1  57561.7
z  58626.1  42713.6  20656.4

> df.list[[3]]

      f     REG03
x 520.4    1821.4
y 46754.9 43862.3
z 9036.9  51876.1

I'm really struggling to know where to start as the resultant dataframes will be different sizes, the columns to split at are different names, and given my actual data is much larger (my result would be 44 dataframes) I can't explicitly reference the col names (although they do all start REG and are followed by 2 digits).

Thanks for any suggestions you may have

Upvotes: 1

Views: 561

Answers (2)

Onyambu
Onyambu

Reputation: 79188

You could use split.default

split.default(df, c(0, cumsum(grepl("^REG", names(df)[-ncol(df)]))))

$`0`
        a       b      c   REG01
x  2844.8  1430.9  906.2  1871.0
y 10232.5 29263.6 6019.1 69618.7
z 20150.6 26334.5 6848.6 45032.2

$`1`
        d        e   REG02
x  2106.0   1818.8  1364.5
y 29929.6 232371.1 57561.7
z 58626.1  42713.6 20656.4

$`2`
        f   REG03
x   520.4  1821.4
y 46754.9 43862.3
z  9036.9 51876.1

Upvotes: 3

Daniel O
Daniel O

Reputation: 4358

in base-R

lapply(split( as.data.frame(t(df)), cumsum(c(1,grepl("REG",colnames(df))))[1:ncol(df)]),t)

gives

$`1`
        a       b      c   REG01
x  2844.8  1430.9  906.2  1871.0
y 10232.5 29263.6 6019.1 69618.7
z 20150.6 26334.5 6848.6 45032.2

$`2`
        d        e   REG02
x  2106.0   1818.8  1364.5
y 29929.6 232371.1 57561.7
z 58626.1  42713.6 20656.4

$`3`
        f   REG03
x   520.4  1821.4
y 46754.9 43862.3
z  9036.9 51876.1

Upvotes: 1

Related Questions