Reputation: 47
I have several data frames in wide format imported from dbf. So every column is a date and every row is an observation. Thus for every day i have between 500-2000 observations depending on the size of the geographic shape i am looking at. For the purposes of reproducible I created 2 dummy data frames with a range of values I may see in my actual data frames.
Data1<- data.frame(replicate(10, sample(0:1000, 20, rep= TRUE)))
Data<- data.frame(replicate(10, sample(0:1000, 20, rep= TRUE)))
Since I have many of these data frames I have put them in a list so I can run functions on many at once.
filenames<- mget(ls(pattern= 'Data'))
Now my issue is that I am trying to write a function to count the number of occurrences in each column where values are within the range 0-100. I can accomplish this with
library(plyr)
Datacount<- ldply(Data, function(x) length(which(x>=0 & x<=100)))
Then i need to be able to match the first column instance (date) in which this counted number is greater than 10% of the total number of observations per column. So for a dataframe with 20 observations I would want the first date where the number of cells between 0-100 is greater than 2. I previously accomplished this using apply (where "V1" is the column name containing the counts)
Datamatch<- apply (Datacount["V1"]>2,2,function(x) match (TRUE,x))
My question is whether there is a way I can combine these functions into one process that I can employ into either a for loop over "filenames" or using one of the lapply family functions?
For detail here is an example of a single function I built to run across each row of the dataframe. This gives me a column index of the last date where each row value is <= 100. Then i used lapply to loop over all dataframes in my list and append the results of the function to the original dataframe.
icein<- function(dataframe){
dataframe$icein<- apply(dataframe, 1, function(x){tail(which(x<=100), 1)})
dataframe
}
list2env(lapply(filenames, icein), envir= .GlobalEnv)
Upvotes: 1
Views: 302
Reputation: 886938
After loading all the 'Data' into a list
, loop over the list
with map
, get the mean
of logical vector (between(., 0, 100)
) check if it greater than or equal to 2, unlist
the data.frame, wrap with which
to get the position index, extract the first
one
library(dplyr)
library(purrr)
n <- 0.2
mget(ls(pattern= 'Data')) %>%
map_int(~ .x %>%
summarise_all(~ mean(between(., 0, 100)) >= n) %>%
unlist %>%
which %>%
first)
Upvotes: 1