ANieder
ANieder

Reputation: 223

Automatic split of character matrix according to a column values into variable number of new dataframes

I would like to split a character matrix I have according to one of the column values. so if for example I have 3 columns and "n" rows, and I want to use column number 2 as reference. The script should look in the second column and group all rows that contain the same value into a dataframe.

So, say I have "A", "B", "C", "D" and "E" values in column 2 through "n" rows. I want to get (in this case) 5 new dataframes containing all rows of data conditioned to the second column values. So all rows that contain "A" in the second column of the matrix go to one dataframe and so on.

My data is much bigger, containing around 400 different character values in the column I want to use as reference (column 2 in the above example) to split so this process needs to be automatic, I mean, it has to automatically detect how many new dataframes should be created according to the number of different values in "column 2".

Here is a shorter example of what i need:

structure(c("Hi", "Med", "Hi", "Low", "A", "D", "A", "C", "8", 
"3", "9", "9", "1", "1", "1", "2"), .Dim = c(4L, 4L), .Dimnames = list(
    NULL, c("b", "x", "y", "z")))

Here I would need to have 3 new dataframes if I use (again) column 2 ("x") as reference. One dataframe containing rows 1 and 3, another dataframe containing row 2 and a final one containing row 4, as there are 3 different values in that column: "A", "D" and "C".

The new dataframes should be named automatically as the value they are being grouped with. So the first dataframe should be named "A", the second "D" and so on. Is it possible to make all this process automatic with my bigger data?

I hope I was clear enough, and sorry if this was already answered before but i couldnt find a solution that worked for me.

Upvotes: 0

Views: 1309

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

By the sounds of it, you're looking for the split function.

x <- structure(c("Hi", "Med", "Hi", "Low", 
                 "A", "D", "A", "C", 
                 "8", "3", "9", "9", 
                 "1", "1", "1", "2"), 
               .Dim = c(4L, 4L), 
               .Dimnames = list(NULL, c("b", "x", "y", "z")))
split(data.frame(x), x[, 2])
# $A
#    b x y z
# 1 Hi A 8 1
# 3 Hi A 9 1
# 
# $C
#     b x y z
# 4 Low C 9 2
# 
# $D
#     b x y z
# 2 Med D 3 1

The resulting data.frames are all in a single list, but you can do things with assign if you want to actually split them into individual data.frames in your workspace.

Upvotes: 2

Related Questions