Reputation: 42
I have two lists with four data frames each. The data frames in the first list ("loc_list_OBS") have only two columns "Year" and "Mean_Precip" while the data frames in the second list ("loc_list_future") have 33 columns "Year" and then mean precipitation values for 32 different models.
So the data frames in loc_list_OBS look like this but the data goes until Year 2005:
Year Mean_Precip
1950 799.1309
1951 748.0239
1952 619.7572
1953 799.9263
1954 680.9194
1955 766.2304
1956 599.5365
1957 717.8912
1958 739.4901
1959 707.1130
... ....
2005 ....
And the data frames in loc_list_future look like this but with 32 Model columns total and the data goes to Year 2059:
Year Model 1 Model 2 Model 3 ...... Model 32
2020 714.1101 686.5888 1048.4274
2021 1018.0095 766.9161 514.2700
2022 756.7066 902.2542 906.2877
2023 906.9675 919.5234 647.6630
2024 767.4008 861.1275 700.2612
2025 876.1538 738.8370 664.3342
2026 781.5092 801.2387 743.8965
2027 876.3522 819.4323 675.3022
2028 626.9468 927.0774 696.1884
2029 752.4084 824.7682 835.1566
.... ..... ..... .....
2059 ..... ..... .....
Each data frame represents a geographic location, and the two lists have the same four locations but one list is for observed values and the other is for predicted future values.
I would like to run two sample t-tests that compare the observed values with the predicted future values for each model at each location. Put another way, I want to compare the first data frame in each list, then the second data frame in each list, and the same with the third and fourth data frames.
Here is the code I have used:
t_stat = NULL
mapply(FUN = function(f, o) {
t_stat <- t.test(o$Mean_Precip, f, alternative = "two.sided")
}, f = loc_list_ttest, o = loc_list_OBS, SIMPLIFY = FALSE)
t_stat
This code only gives me four t-test outputs that are comparing the "Mean_Precip" columns in the observed data with what appears to be a combination of all the models in the future data. However I need a t-test for each model at each location. Can anyone figure out how to do this?
Upvotes: 1
Views: 810
Reputation: 11056
Here is a way of doing what you want although if the projections were based on the observations, the validity of the p-values is suspect because the two "samples" are not independent.
results <- lapply(1:4, function(y) lapply(loc_list_future[[y]][, -1],
function(x) t.test(loc_list_OBS[[y]], x)))
names(results) <- c("Region 1", "Region 2", "Region 3", "Region 4")
results
will be a list containing four lists, one for each region. Within each region list will be a list for each model. results[[1]]
gives you the results for all models in region 1 and results[[1]][[1]]
gives you the results for region 1 model 1.
Upvotes: 0
Reputation: 39595
You can tackle the issue with an approach like this. I understood that you want to compare each dataframe with other and obtain a t-test for each variable across second dataframe. One approach is to create a function to loop across the variables in second dataframe and then save the results in a list. You will have four list and inside each of them all the t-test. I have created dummy data based on what you shared:
#Data
df <- structure(list(Year = c(1950L, 1951L, 1952L, 1953L, 1954L, 1955L,
1956L, 1957L, 1958L, 1959L, 2005L), Mean_Precip = c(799.1309,
748.0239, 619.7572, 799.9263, 680.9194, 766.2304, 599.5365, 717.8912,
739.4901, 707.113, 707.113)), class = "data.frame", row.names = c(NA,
-11L))
#Data2
df1 <- structure(list(Year = c(2020L, 2021L, 2022L, 2023L, 2024L, 2025L,
2026L, 2027L, 2028L, 2029L, 2059L), Model.1 = c(714.1101, 1018.0095,
756.7066, 906.9675, 767.4008, 876.1538, 781.5092, 876.3522, 626.9468,
752.4084, 752.4084), Model.2 = c(686.5888, 766.9161, 902.2542,
919.5234, 861.1275, 738.837, 801.2387, 819.4323, 927.0774, 824.7682,
824.7682), Model.3 = c(1048.4274, 514.27, 906.2877, 647.663,
700.2612, 664.3342, 743.8965, 675.3022, 696.1884, 835.1566, 835.1566
)), class = "data.frame", row.names = c(NA, -11L))
Now, we will create the lists (you must have them):
#Lists
List1 <- list(df1=df,df2=df,df3=df,df4=df)
List2 <- list(df1=df1,df2=df1,df3=df1,df4=df1)
Here is the function:
#Function
myfun <- function(x,y)
{
l <- x$Mean_Precip
#Empty list
List <- list()
#Now loop
for(i in 2:dim(y)[2])
{
#Label
val <- names(y[,i,drop=F])
r <- y[,i]
#Test
test <- t.test(l, r, alternative = "two.sided")
#Save
List[[i-1]] <- test
names(List)[i-1] <- val
}
return(List)
}
Finally, we apply:
#Apply
t.stat <- mapply(FUN = myfun,x=List1,y=List2,SIMPLIFY = FALSE)
The output is a list of lists and you can explore each element as next:
t.stat[[1]]
Where you will find the results from comparing first dataframe against all the variables from the second dataframe:
Output:
$Model.1
Welch Two Sample t-test
data: l and r
t = -2.2645, df = 16.448, p-value = 0.03738
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-165.949710 -5.657818
sample estimates:
mean of x mean of y
716.8302 802.6339
$Model.2
Welch Two Sample t-test
data: l and r
t = -3.5901, df = 19.56, p-value = 0.001881
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-170.75516 -45.13574
sample estimates:
mean of x mean of y
716.8302 824.7756
$Model.3
Welch Two Sample t-test
data: l and r
t = -0.72149, df = 13.829, p-value = 0.4826
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-138.01368 68.59334
sample estimates:
mean of x mean of y
716.8302 751.5403
Upvotes: 3