Reputation: 363
I am currently working with a list of dataframes.
Actually, I have about a hundred csv files representing forecasts of some kind, where the date on which the forecast was made is in the first line, the lines thereafter contain the predicted values. The data might look like this:
2010/04/15 10:12:51 #Date of the forecast
2010/05/02 2372 #Date for which the forecast was made and the value assigned
2010/05/09 2298
2009/04/15 10:09:13 #another forecast
....
2010/05/02 2298 #also predicts for 2010/05/02
As you might guess, the forecasts do predict values quite some time ahead (e.g. 5 years), which means predictions for the date 2010/05/02 were not only made on 2010/04/15 but also 2009/04/15 and so on (actually, forecasts are done weekly).
I would like to compare how the predicted value for a specified date (for example 2010/05/02) has changed over time.
Right now, I read in all .csv datas I have as a dataframe, and save each of the resulting dataframes in a list.
(Sadly, the date on which the prediction was made got lost-I hoped to be able to name the list elements with the respective date but have not yet figured out how to do this-still, I am pretty sure I'll find something somewhere, not the main problem here)
That's where the question title comes in: I would like to know how to filter a list of dataframes by row value.
So, I'd like to be able to use a function: function(2010/05/02) and get as a result the rows of each Element of the list (each dataframe in the list) where Date is 2010/05/02.
In this case I'd like to get:
2010/05/02 2372
2010/05/02 2298
I know how to do this using a for loop, but it needs endlessly much time.
I am happy for any suggestions.
(By this example you might understand why it is important to know when the prediction was made- which I would not have right now. I was thinking about adding a new row containing the date on which the prediction was made in each dataframe)
Threads visited until now include:
get column from list of dataframes R
convert a row of a data frame to a simple vector in R
How to get the name of a data.frame within a list? (which more or less adresses the name problem)
As you can see, no thread was particularly helpful.
As requested, a small reproducible example:
dateList <- as.Date(seq(0,100,5),origin="2010-01-01")
forecasts <- seq(2000,3000,50)
df1 <- data.frame(dateList,forecasts)
df2 <- data.frame(dateList-50,forecasts)
l <- list(df1,df2)
we have dates from 2010-01-01 in 5 days steps. I would for example like to know the predicted values for 2010-01-01 in both dataframes.
The first dataframe looks like this:
dateList forecasts
1 2010-01-01 2000
2 2010-01-06 2050
3 2010-01-11 2100
while the second looks like this:
10 2009-12-27 2450
11 2010-01-01 2500
12 2010-01-06 2550
I was hoping to find out for example the predicted values for 2010-01-01.
So, for example:
function(2010-01-01):
2000
2500
Upvotes: 0
Views: 172
Reputation: 24188
You could alternatively use the following approach, given your list is called ls
and the date column date
in all data.frame
's:
my.ls <- lapply(ls, subset, date == "2010/05/02")
df <- do.call("rbind", my.ls)
Upvotes: 1
Reputation: 70643
Couldn't wait for your example so I made a small one. Let me know if this is in the general direction of what you're after.
xy <- list(df1 = data.frame(dates = as.Date(c("2016-01-01", "2016-01-02", "2016-01-03")), value = runif(3)),
df2 = data.frame(dates = as.Date(c("2016-01-01", "2016-01-02", "2016-01-03")), value = runif(3)),
df3 = data.frame(dates = as.Date(c("2016-01-01", "2016-01-02", "2016-01-03")), value = runif(3))
)
getValueOnDate <- function(x, list.all) {
lapply(list.all, FUN = function(m) m[m$dates %in% x, ])
}
out <- getValueOnDate(as.Date("2016-01-02"), list.all = xy)
do.call("rbind", out)
dates value
df1 2016-01-02 0.7665590
df2 2016-01-02 0.9907976
df3 2016-01-02 0.4909025
You can obviously modify the function to return just the values.
Upvotes: 1