Elizabeth Mist
Elizabeth Mist

Reputation: 33

Separating Columns into individual dataframes in R

I've found lots of information for doing something similar to what I want to do, but nothing that seems like it will do exactly what I want.

I'm trying to chop / split / cut up a dataframe so that each column is its own dataframe, keeping its column name and the rownames

To clarify, I am not trying to split the dataframe by variables in one column

This is as close as I've got using split.default and lapply

dge.norm_split <- split.default(dge.norm, colnames(dge.norm))
out <- lapply(dge.norm_split, cbind, dge.norm[1])

lapply(out, head, 3)

returns


$`/s-mcpb-ms03/union/is/PRO3/Data/50-0142/20230628_PRO3_FA_001_50-0142_ObesityWeightLoss_P00_QC-001.d`
    A matrix: 3 × 2 of type dbl 5672.1960   5672.196
    159.7225    5672.196
    1304.3856   5672.196
$`/s-mcpb-ms03/union/is/PRO3/Data/50-0142/20230628_PRO3_FA_012_50-0142_ObesityWeightLoss_P00_QC-002.d`
    A matrix: 3 × 2 of type dbl 28822.54894 5672.196
    837.19595   5672.196
    93.87691    5672.196
(truncacted)

So I think I've created a list containg each column as its own table, although these should be 3x1 matrices and I don't why they are 3x2

I also can't figure out how to then extract each into its own variable without doing it manually (there are 29 so this is a bit laborious)

Some kind of for loop?

Solution:

Thanks everyone for responding. I tried your suggestions but in the end I took the long way round to get the output I was looking for:

s1 <- data.frame(dge.norm[,-c(2:6)])
s2 <- data.frame(dge.norm[,-c(1, 3:6)])
s3 <- data.frame(dge.norm[,-c(1:2, 4:6)])
s4 <- data.frame(dge.norm[,-c(1:3, 5:6)])
s5 <- data.frame(dge.norm[,-c(1:4, 6)])
s6 <- data.frame(dge.norm[,-c(1:5)])

s1$model <- rep(list('s1'), 337)
s2$model <- rep(list('s2'), 337)
s3$model <- rep(list('s3'), 337)
s4$model <- rep(list('s4'), 337)
s5$model <- rep(list('s5'), 337)
s6$model <- rep(list('s6'), 337)

colnames(s1) <- c('count', 'model')
colnames(s2) <- c('count', 'model')
colnames(s3) <- c('count', 'model')
colnames(s4) <- c('count', 'model')
colnames(s5) <- c('count', 'model')
colnames(s6) <- c('count', 'model')

What I had originally was a gene count matrix where rownames = genes and colnames = samples. I merged and stacked the count data for each sample into one column, which meant repeating rownames * number of samples, and added a new column corresponding to the sample

Upvotes: 0

Views: 75

Answers (3)

s_baldur
s_baldur

Reputation: 33488

split.default(iris, seq_along(iris)) # Thanks Rui B. for suggestion

Or with names() (turn first to factor to avoid reordering)

split.default(iris, factor(names(iris), names(iris)))

Or reorder after the fact (as suggested by LMc)

split.default(iris, names(iris))[names(iris)]

Upvotes: 3

Rui Barradas
Rui Barradas

Reputation: 76402

Coerce each column to data.frame in a lapply loop, then assign the column names.

df_list <- mtcars |> lapply(as.data.frame)
df_list <- mapply(setNames, df_list, names(df_list), SIMPLIFY = FALSE)

str(df_list)
#> List of 11
#>  $ mpg :'data.frame':    32 obs. of  1 variable:
#>   ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#>  $ cyl :'data.frame':    32 obs. of  1 variable:
#>   ..$ cyl: num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#>  $ disp:'data.frame':    32 obs. of  1 variable:
#>   ..$ disp: num [1:32] 160 160 108 258 360 ...
#>  $ hp  :'data.frame':    32 obs. of  1 variable:
#>   ..$ hp: num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
#>  $ drat:'data.frame':    32 obs. of  1 variable:
#>   ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#>  $ wt  :'data.frame':    32 obs. of  1 variable:
#>   ..$ wt: num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
#>  $ qsec:'data.frame':    32 obs. of  1 variable:
#>   ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
#>  $ vs  :'data.frame':    32 obs. of  1 variable:
#>   ..$ vs: num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
#>  $ am  :'data.frame':    32 obs. of  1 variable:
#>   ..$ am: num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
#>  $ gear:'data.frame':    32 obs. of  1 variable:
#>   ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
#>  $ carb:'data.frame':    32 obs. of  1 variable:
#>   ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...

Created on 2024-05-24 with reprex v2.1.0

Upvotes: 1

zephryl
zephryl

Reputation: 17079

Similar to what @LMc suggested in a comment, you can subset by iterating over column names:

dfs <- lapply(names(iris), \(x) iris[x])

Result:

#> lapply(dfs, head)
[[1]]
  Sepal.Length
1          5.1
2          4.9
3          4.7
4          4.6
5          5.0
6          5.4

[[2]]
  Sepal.Width
1         3.5
2         3.0
3         3.2
4         3.1
5         3.6
6         3.9

[[3]]
  Petal.Length
1          1.4
2          1.4
3          1.3
4          1.5
5          1.4
6          1.7

[[4]]
  Petal.Width
1         0.2
2         0.2
3         0.2
4         0.2
5         0.2
6         0.4

[[5]]
  Species
1  setosa
2  setosa
3  setosa
4  setosa
5  setosa
6  setosa

Upvotes: 3

Related Questions