Reputation: 303
I have a very large data set, and I have already split it into 50 pieces So basically the file looks like: file1 file2 file3 . . . file50 (data frames)
file_total <- c(file1,...,file50)
I know this will combine it into a list, but I can't use rbind
since the whole all data is huge and the plyr library just takes forever to run
And in each of the files, I have to split them based on 1 factor, name it "id", then be able to write each of the id subsets into a .csv file
so far, my codes are:
d_split <- split(file1, file1[1])
library(plry)
id <- unlist(lapply(d_split,"[",1,1)) # this returns the unique id
for (j in seq_along(id))
{
write.csv(d_split[[j]], file=paste(id[j], "csv", sep="."))
}
this works!!
but It doesn't work when I try to put it into a another for loop:
for (i in file_total)
{
d_split <- split(i, i[1])
id <- unlist(lapply(d_split,"[",1,1))
for (j in seq_along(id))
{
write.csv(d_split[[j]], file=paste(id[j], "csv", sep="."))
}
}
It returns to the following error messages:
Error in FUN(X[[1L]], ...) : incorrect number of dimensions
I meant I could done it manually by copy and pasting 50 files into the code, but was just wondering if anyone could fix my code, so that one click will get it solved.
Upvotes: 1
Views: 894
Reputation: 78590
The problem occurs based on how you combine the data. Instead of combining them with c
, make them into a list:
file_total <- list(file1,...,file50)
At this point, doing i in file_total
will iterate as you want it to.
As an explanation: using c
with data frames (as I'm assuming file1
and file2
are) will actually turn them into a list of vectors rather than a list of data frames. For instance:
file1 = data.frame(x=1:20)
file2 = data.frame(y=20:40)
file_total = c(file1, file2)
# file_total will be:
# $x
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
#
# $y
# [1] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Thus, iterating over them will actually iterate over the individual columns as vectors. However, using list
to combine them will let you iterate over the data frames themselves:
> list(file1, file2)
[[1]]
x
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
[[2]]
y
1 20
2 21
3 22
4 23
5 24
6 25
7 26
8 27
9 28
10 29
11 30
12 31
13 32
14 33
15 34
16 35
17 36
18 37
19 38
20 39
21 40
Upvotes: 3