Reputation: 31
Through investigations and great help from users on this site I have been able to upload multiple xlsx files into R, both as individual dataframes and as a single object containing multiple dataframes
(Folder="I:/Marcs_Discretinization_try_1/Attempt1/Actual Data/actualdata/"
Files=list.files(path=Folder, pattern=".xlsx")
x=sapply(paste0(Folder,Files), read.xlsx2, as.data.frame=T, sheetIndex=1, simplify=FALSE)
with the code above, I can call on an individual df
as x$~2015-B1-2OR.xlsx
, but how would I iterate through each of these objects and is this an easier way to approach the problem than setting a for loop with a ton of dataframes (~200)?
Example data if I call on object x
:
$`I:/Marcs_Discretinization_try_1/Attempt1/Actual Data/actualdata/2015-X2-2OR.xlsx`
Year Day Tank depth.1 depth.2 mid.depth S
1 2015 2OR X2 0.11 0.135 0.1225 4.1
2 2015 2OR X2 0.135 0.16 0.1475 5.6
$`I:/Marcs_Discretinization_try_1/Attempt1/Actual Data/actualdata/2015-X2-OR10.xlsx`
Year Day Tank depth.1 depth.2 mid.depth S
1 2015 OR10 X2 0.075 0.1 0.0875 4.6
2 2015 OR10 X2 0.1 0.125 0.1125 4.2
3 2015 OR10 X2 0.125 0.16 0.1425 5.2
4 2015 OR10 X2 0.16 0.175 0.1675 5.2
5 2015 OR10 X2 0.175 0.2 0.1875 6.8
6 2015 OR10 X2 0.2 0.225 0.2125 7.5
7 2015 OR10 X2 0.225 0.25 0.2375 8.8
You can see there are multiple columns and rows within each level of x
. How can I iterate through the levels of x
and call upon specific columns?
For example lapply(x, nrow)
lists the rows of each level but what if I want to return nrow
of particular columns?
Upvotes: 0
Views: 45
Reputation: 6496
First off, I think it's good idea to simplify the names of your x
list:
names(x) <- gsub("^I:/Marcs_Discretinization_try_1/Attempt1/Actual Data/actualdata/|\\.xlsx","",names(x))
As your data is just too complex, I made a list to work with:
A <- structure(list(A1 = structure(list(x = structure(c(1L, 1L, 2L,
3L, 2L), .Label = c("a", "b", "c"), class = "factor"), y = c(0.00840516341850162,
0.970356883713976, 0.469053473789245, 0.847559429006651, 0.646102252649143
)), .Names = c("x", "y"), row.names = c(NA, -5L), class = "data.frame"),
A2 = structure(list(x = structure(c(1L, 1L, 2L, 3L, 2L), .Label = c("a",
"b", "c"), class = "factor"), y = c(0.599587128963321, 0.390590411843732,
0.11197471502237, 0.824506989680231, 0.608971498440951)), .Names = c("x",
"y"), row.names = c(NA, -5L), class = "data.frame"), A3 = structure(list(
x = structure(c(1L, 1L, 2L, 3L, 2L), .Label = c("a",
"b", "c"), class = "factor"), y = c(-2.61798606622622,
0.696978535260992, -0.758098875328806, -1.08656950056061,
1.3469375195447), z = c(0.346128243254498, 0.691243288572878,
0.285317465662956, 0.125597422709689, 0.0258157614152879
)), .Names = c("x", "y", "z"), row.names = c(NA, -5L), class = "data.frame")), .Names = c("A1",
"A2", "A3"))
Now, I think you can use lapply
for just about any operation you want to perform on each of the data frames:
# get the number of observations in each data frame
lapply(A,nrow)
# produces number of rows of each data frame
library(data.table)
lapply(A,setDT)
lapply(A, function(j) j[,sum(y),by=x][order(-V1)])
# sums A[i]$y over each level in A[i]$x; with i being each data frame in A
and pretty much anything you want to do...
Upvotes: 1