Reputation: 33
I've found lots of information for doing something similar to what I want to do, but nothing that seems like it will do exactly what I want.
I'm trying to chop / split / cut up a dataframe so that each column is its own dataframe, keeping its column name and the rownames
To clarify, I am not trying to split the dataframe by variables in one column
This is as close as I've got using split.default and lapply
dge.norm_split <- split.default(dge.norm, colnames(dge.norm))
out <- lapply(dge.norm_split, cbind, dge.norm[1])
lapply(out, head, 3)
returns
$`/s-mcpb-ms03/union/is/PRO3/Data/50-0142/20230628_PRO3_FA_001_50-0142_ObesityWeightLoss_P00_QC-001.d`
A matrix: 3 × 2 of type dbl 5672.1960 5672.196
159.7225 5672.196
1304.3856 5672.196
$`/s-mcpb-ms03/union/is/PRO3/Data/50-0142/20230628_PRO3_FA_012_50-0142_ObesityWeightLoss_P00_QC-002.d`
A matrix: 3 × 2 of type dbl 28822.54894 5672.196
837.19595 5672.196
93.87691 5672.196
(truncacted)
So I think I've created a list containg each column as its own table, although these should be 3x1 matrices and I don't why they are 3x2
I also can't figure out how to then extract each into its own variable without doing it manually (there are 29 so this is a bit laborious)
Some kind of for loop?
Thanks everyone for responding. I tried your suggestions but in the end I took the long way round to get the output I was looking for:
s1 <- data.frame(dge.norm[,-c(2:6)])
s2 <- data.frame(dge.norm[,-c(1, 3:6)])
s3 <- data.frame(dge.norm[,-c(1:2, 4:6)])
s4 <- data.frame(dge.norm[,-c(1:3, 5:6)])
s5 <- data.frame(dge.norm[,-c(1:4, 6)])
s6 <- data.frame(dge.norm[,-c(1:5)])
s1$model <- rep(list('s1'), 337)
s2$model <- rep(list('s2'), 337)
s3$model <- rep(list('s3'), 337)
s4$model <- rep(list('s4'), 337)
s5$model <- rep(list('s5'), 337)
s6$model <- rep(list('s6'), 337)
colnames(s1) <- c('count', 'model')
colnames(s2) <- c('count', 'model')
colnames(s3) <- c('count', 'model')
colnames(s4) <- c('count', 'model')
colnames(s5) <- c('count', 'model')
colnames(s6) <- c('count', 'model')
What I had originally was a gene count matrix where rownames = genes and colnames = samples. I merged and stacked the count data for each sample into one column, which meant repeating rownames * number of samples, and added a new column corresponding to the sample
Upvotes: 0
Views: 75
Reputation: 33488
split.default(iris, seq_along(iris)) # Thanks Rui B. for suggestion
Or with names() (turn first to factor to avoid reordering)
split.default(iris, factor(names(iris), names(iris)))
Or reorder after the fact (as suggested by LMc)
split.default(iris, names(iris))[names(iris)]
Upvotes: 3
Reputation: 76402
Coerce each column to data.frame in a lapply
loop, then assign the column names.
df_list <- mtcars |> lapply(as.data.frame)
df_list <- mapply(setNames, df_list, names(df_list), SIMPLIFY = FALSE)
str(df_list)
#> List of 11
#> $ mpg :'data.frame': 32 obs. of 1 variable:
#> ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> $ cyl :'data.frame': 32 obs. of 1 variable:
#> ..$ cyl: num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#> $ disp:'data.frame': 32 obs. of 1 variable:
#> ..$ disp: num [1:32] 160 160 108 258 360 ...
#> $ hp :'data.frame': 32 obs. of 1 variable:
#> ..$ hp: num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
#> $ drat:'data.frame': 32 obs. of 1 variable:
#> ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#> $ wt :'data.frame': 32 obs. of 1 variable:
#> ..$ wt: num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
#> $ qsec:'data.frame': 32 obs. of 1 variable:
#> ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
#> $ vs :'data.frame': 32 obs. of 1 variable:
#> ..$ vs: num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
#> $ am :'data.frame': 32 obs. of 1 variable:
#> ..$ am: num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
#> $ gear:'data.frame': 32 obs. of 1 variable:
#> ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
#> $ carb:'data.frame': 32 obs. of 1 variable:
#> ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
Created on 2024-05-24 with reprex v2.1.0
Upvotes: 1
Reputation: 17079
Similar to what @LMc suggested in a comment, you can subset by iterating over column names:
dfs <- lapply(names(iris), \(x) iris[x])
Result:
#> lapply(dfs, head)
[[1]]
Sepal.Length
1 5.1
2 4.9
3 4.7
4 4.6
5 5.0
6 5.4
[[2]]
Sepal.Width
1 3.5
2 3.0
3 3.2
4 3.1
5 3.6
6 3.9
[[3]]
Petal.Length
1 1.4
2 1.4
3 1.3
4 1.5
5 1.4
6 1.7
[[4]]
Petal.Width
1 0.2
2 0.2
3 0.2
4 0.2
5 0.2
6 0.4
[[5]]
Species
1 setosa
2 setosa
3 setosa
4 setosa
5 setosa
6 setosa
Upvotes: 3