Kaylee Rowland
Kaylee Rowland

Reputation: 23

Splitting large data frame by column into smaller data frames (not lists) using loops

I have many large data frames. Using of the smaller ones for example:

dim(ch29)  
476 4283  

I need to split it into smaller pieces (i.e. subset into 241 columns at the most). My problems come afterwards when I want to analyze these smaller subsets.

I do not know how to subset the large date-frame into smaller data-frames and not simply a list.

I also want to do all of this in a loop and give the newly created smaller data frames unique names in the loop.

chunk=241
df<-ch29
n<-ceiling(ncol(df)/chunk)

for (i in 1:n) {
  xname <- paste("ch29", i, sep="_")
  cat("_", xname)
  assign(xname, split(df, rep(1:n, each=chunk, length.out=ncol(df))))
}

Upvotes: 1

Views: 603

Answers (2)

eipi10
eipi10

Reputation: 93761

I'm not exactly sure what you're trying to do or how you want to choose the columns that go in each data frame, but here's an example of one option:

# Fake data
set.seed(100)
ch29 = as.data.frame(replicate(4283, rnorm(476)))

# Number of columns we want in each split data frame
ncols = floor(ncol(ch29)/20)

# Start column for each split data frame
start = seq(1,ncol(ch29),ncols)

# Split ch29 into a bunch of separate data frames
df.list = lapply(setNames(start, paste0("ch29_", start, "_", start+ncols-1)), 
                 function(i) ch29[ , i:min(i+ncols-1,ncol(ch29))])

You now have a list, df.list, where each list element is a data frame with ncols columns from ch29, except for the last element of the list, which will have between 1 and ncols columns. Also, the name of each list element is the name of the parent data frame (ch29) and the column range from which the subset data frame is drawn.

Upvotes: 3

lebelinoz
lebelinoz

Reputation: 5068

Try

for (i in 1:3) { # i = 1
  xname = paste("ch29", i, sep = "_")
  col.min = (i - 1) * chunk + 1
  col.max = min(i * chunk, ncol(df))
  assign(xname, df[,col.min:col.max])
}

In other words, use the notation df[,a:b], where a < b, to get the subset of the dataframe df consisting only of columns a to b.

Upvotes: 1

Related Questions