Split Data Frame Into N Data Frames Based On Column Names

Question

I have a large data (thousands of columns) frame in which few columns have duplicate column name. Then, there are set of column names which have part of column name which is duplicate and another part of the same column name is not.

Using R and above two properties, I want to split all such columns into different data frames for further analysis. To achieve this I want to run following dynamic logic on data frame:

First: Find and cbind() duplicate column name columns into different data frames. If 10 columns have same column name, they form a data frame and another another 5 with same column name form another data frame.
Second: Find and cbind() duplicate column name columns into different data frames if the string of column name before - matches with the string of column name before - for another column and the string of column name after - doesn't match with part of column name after - for another column.

Below is the sample input data (the big data is too big, but follows exact same property), for which first two columns will form a single data frame based on above example. There will be another data frame that will contain columns starting three to the last one.

I tried split(), but that hasn't worked out so far. Any suggestions on how I can do this?

Sample Input Data

structure(list(`A-DIODE` = c(1.2, 0.4), `A-DIODE` = c(1.3, 0.6
), `B-DIODE` = c(1.4, 0.8), `B-ACC1` = c(1.5, 1), `B-ACC2` = c(1.6, 
1.2), `B-ANA0` = c(1.7, 1.4), `B-ANA1` = c(1.8, 1.6), `B-BRICKID` = c(1.9, 
1.8), `B-CC0` = c(2L, 2L), `B-CC1` = c(2.1, 2.2), `B-DIGDN` = c(2.2, 
2.4), `B-DIGDP` = c(2.3, 2.6), `B-DN1` = c(2.4, 2.8), `B-DN2` = c(2.5, 
3), `B-DP1` = c(2.6, 3.2), `B-DP2` = c(2.7, 3.4), `B-SCL` = c(2.8, 
3.6), `B-SDA` = c(2.9, 3.8), `B-USB0DN` = 3:4, `B-USB0DP` = c(3.1, 
4.2), `B-USB1DN` = c(3.2, 4.4), `B-USB1DP` = c(3.3, 4.6), `B-ACC1` = c(3.4, 
4.8), `B-ACC2` = c(3.5, 5), `B-ANA0` = c(3.6, 5.2), `B-ANA1` = c(3.7, 
5.4), `B-BRICKID` = c(3.8, 5.6), `B-CC0` = c(3.9, 5.8), `B-CC1` = c(4L, 
6L), `B-DIGDN` = c(4.1, 6.2), `B-DIGDP` = c(4.2, 6.4), `B-DN1` = c(4.3, 
6.6), `B-DN2` = c(4.4, 6.8), `B-DP1` = c(4.5, 7), `B-DP2` = c(4.6, 
7.2), `B-SCL` = c(4.7, 7.4), `B-SDA` = c(4.8, 7.6), `B-USB0DN` = c(4.9, 
7.8), `B-USB0DP` = c(5L, 8L), `B-USB1DN` = c(5.1, 8.2), `B-USB1DP` = c(5.2, 
8.4), `B-NA` = c(5.3, 8.6), `B-ACC2PWRLKG_0v4` = c(5.4, 8.8), 
`B-ACC2PWRLKG_0v4` = c(5.5, 9), `B-P_IN_Leak` = c(5.6, 9.2
)), class = "data.frame", row.names = c(NA, -2L))

Output Based On Logic Discussed Above

Data Frame 1

A-DIODE A-DIODE
1.2     1.3
0.4     0.6

Data Frame 2

B-DIODE B-ACC1 B-ACC2 B-ANA0 B-ANA1 B-BRICKID B-CC0 B-CC1 B-DIGDN B-DIGDP B-DN1 B-DN2 B-DP1 B-DP2 B-SCL B-SDA B-USB0DN B-USB0DP
1.4    1.5    1.6    1.7    1.8       1.9     2   2.1     2.2     2.3   2.4   2.5   2.6   2.7   2.8   2.9        3      3.1
0.8    1.0    1.2    1.4    1.6       1.8     2   2.2     2.4     2.6   2.8   3.0   3.2   3.4   3.6   3.8        4      4.2
B-USB1DN B-USB1DP B-ACC1.1 B-ACC2.1 B-ANA0.1 B-ANA1.1 B-BRICKID.1 B-CC0.1 B-CC1.1 B-DIGDN.1 B-DIGDP.1 B-DN1.1 B-DN2.1 B-DP1.1
3.2      3.3      3.4      3.5      3.6      3.7         3.8     3.9       4       4.1       4.2     4.3     4.4     4.5
4.4      4.6      4.8      5.0      5.2      5.4         5.6     5.8       6       6.2       6.4     6.6     6.8     7.0
B-DP2.1 B-SCL.1 B-SDA.1 B-USB0DN.1 B-USB0DP.1 B-USB1DN.1 B-USB1DP.1 B-NA B-ACC2PWRLKG_0v4 B-ACC2PWRLKG_0v4.1 B-P_IN_Leak
4.6     4.7     4.8        4.9          5        5.1        5.2  5.3              5.4                5.5         5.6
7.2     7.4     7.6        7.8          8        8.2        8.4  8.6              8.8                9.0         9.2

akrun · Accepted Answer

We can use split.default on the substring of names of the dataset

split.default(df1, sub("-.*", "", names(df1)))

Or if we know there would be only one character before -

split.default(df1, substr(names(df1), 1, 1))
#$A
#  A-DIODE A-DIODE.1
#1     1.2       1.3
#2     0.4       0.6

#$B
#  B-DIODE B-ACC1 B-ACC2 B-ANA0 B-ANA1 B-BRICKID B-CC0 B-CC1 B-DIGDN B-DIGDP B-DN1 B-DN2 B-DP1 B-DP2 B-SCL B-SDA B-USB0DN B-USB0DP
#1     1.4    1.5    1.6    1.7    1.8       1.9     2   2.1     2.2     2.3   2.4   2.5   2.6   2.7   2.8   2.9        3      3.1
#2     0.8    1.0    1.2    1.4    1.6       1.8     2   2.2     2.4     2.6   2.8   3.0   3.2   3.4   3.6   3.8        4      4.2
#  B-USB1DN B-USB1DP B-ACC1.1 B-ACC2.1 B-ANA0.1 B-ANA1.1 B-BRICKID.1 B-CC0.1 B-CC1.1 B-DIGDN.1 B-DIGDP.1 B-DN1.1 B-DN2.1 B-DP1.1 B-DP2.1
#1      3.2      3.3      3.4      3.5      3.6      3.7         3.8     3.9       4       4.1       4.2     4.3     4.4     4.5     4.6
#2      4.4      4.6      4.8      5.0      5.2      5.4         5.6     5.8       6       6.2       6.4     6.6     6.8     7.0     7.2
#  B-SCL.1 B-SDA.1 B-USB0DN.1 B-USB0DP.1 B-USB1DN.1 B-USB1DP.1 B-NA B-ACC2PWRLKG_0v4 B-ACC2PWRLKG_0v4.1 B-P_IN_Leak
#1     4.7     4.8        4.9          5        5.1        5.2  5.3              5.4                5.5         5.6
#2     7.4     7.6        7.8          8        8.2        8.4  8.6              8.8                9.0         9.2

Split Data Frame Into N Data Frames Based On Column Names

Answers (1)

Related Questions