Automatic split of character matrix according to a column values into variable number of new dataframes

Question

I would like to split a character matrix I have according to one of the column values. so if for example I have 3 columns and "n" rows, and I want to use column number 2 as reference. The script should look in the second column and group all rows that contain the same value into a dataframe.

So, say I have "A", "B", "C", "D" and "E" values in column 2 through "n" rows. I want to get (in this case) 5 new dataframes containing all rows of data conditioned to the second column values. So all rows that contain "A" in the second column of the matrix go to one dataframe and so on.

My data is much bigger, containing around 400 different character values in the column I want to use as reference (column 2 in the above example) to split so this process needs to be automatic, I mean, it has to automatically detect how many new dataframes should be created according to the number of different values in "column 2".

Here is a shorter example of what i need:

structure(c("Hi", "Med", "Hi", "Low", "A", "D", "A", "C", "8", 
"3", "9", "9", "1", "1", "1", "2"), .Dim = c(4L, 4L), .Dimnames = list(
    NULL, c("b", "x", "y", "z")))

Here I would need to have 3 new dataframes if I use (again) column 2 ("x") as reference. One dataframe containing rows 1 and 3, another dataframe containing row 2 and a final one containing row 4, as there are 3 different values in that column: "A", "D" and "C".

The new dataframes should be named automatically as the value they are being grouped with. So the first dataframe should be named "A", the second "D" and so on. Is it possible to make all this process automatic with my bigger data?

I hope I was clear enough, and sorry if this was already answered before but i couldnt find a solution that worked for me.

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

By the sounds of it, you're looking for the split function.

x <- structure(c("Hi", "Med", "Hi", "Low", 
                 "A", "D", "A", "C", 
                 "8", "3", "9", "9", 
                 "1", "1", "1", "2"), 
               .Dim = c(4L, 4L), 
               .Dimnames = list(NULL, c("b", "x", "y", "z")))
split(data.frame(x), x[, 2])
# $A
#    b x y z
# 1 Hi A 8 1
# 3 Hi A 9 1
# 
# $C
#     b x y z
# 4 Low C 9 2
# 
# $D
#     b x y z
# 2 Med D 3 1

The resulting data.frames are all in a single list, but you can do things with assign if you want to actually split them into individual data.frames in your workspace.

Automatic split of character matrix according to a column values into variable number of new dataframes

Answers (1)

Related Questions