brum2393
brum2393

Reputation: 23

How to calculate the variance of specific variable across multiple datasets in R

I have 3 data sets, each with variables time_tick, gyr_X_value, gyr_Y_value, and gyr_Z_value.

An example of one of the data sets is as follows:

 time_tick gyr_X_value  gyr_Y_value  gyr_Z_value
1   .01    .12             .24         -.28               
2   .12      0               0          .05
3   .04    .10               0          .17
4   .03      0            -.25          .15

I know that I can calculate the variance of the each individual data set with var(), but how can I calculate the variance of gyr_X_value across all three data sets?

Upvotes: 2

Views: 1512

Answers (3)

Claus Wilke
Claus Wilke

Reputation: 17790

For those kinds of problems, I strongly recommend the tidyverse approach.

Your data:

df <- read.table(text = "time_tick gyr_X_value  gyr_Y_value  gyr_Z_value
1   .01    .12             .24         -.28               
2   .12      0               0          .05
3   .04    .10               0          .17
4   .03      0            -.25          .15", header = TRUE)

The calculation:

library(tidyverse)

df %>% gather(variable, value, -time_tick) %>%
  group_by(variable) %>%
  summarize(variance = var(value))

## A tibble: 3 x 2
#     variable variance
#        <chr>    <dbl>
#1 gyr_X_value 0.004100
#2 gyr_Y_value 0.040025
#3 gyr_Z_value 0.043425

Explanation: First, the gather function turns your wide data frame into a long one:

df %>% gather(variable, value, -time_tick)
#   time_tick    variable value
#1       0.01 gyr_X_value  0.12
#2       0.12 gyr_X_value  0.00
#3       0.04 gyr_X_value  0.10
#4       0.03 gyr_X_value  0.00
#5       0.01 gyr_Y_value  0.24
#6       0.12 gyr_Y_value  0.00
#7       0.04 gyr_Y_value  0.00
#8       0.03 gyr_Y_value -0.25
#9       0.01 gyr_Z_value -0.28
#10      0.12 gyr_Z_value  0.05
#11      0.04 gyr_Z_value  0.17
#12      0.03 gyr_Z_value  0.15

The group_by() function then sets up the grouping by variable, and the summarize() function calculates the variance separately within the groupings.

Upvotes: 0

Alan Effrig
Alan Effrig

Reputation: 773

You can use rbind. Given data frames a, b, and c, they can be combined by row with

combined <- rbind(a,b,c)

See here for detailed usage.. Then you can use var() as usual on a given column, for example, combined[, 2].

Upvotes: 0

akrun
akrun

Reputation: 887223

We can place the datasets in a list, extract the 'gyr_X_value' column, and use the rowVars if we need to find the variance of each row

library(matrixStats)
rowVars(sapply(list(df1, df2, df3), `[[`, 'gyr_X_value'))

Suppose, the interest is to find variance of the specific column for each dataset, then use var after extracting the column

sapply(list(df1, df2, df3), function(x) var(x[['gyr_X_value']]))

Note: The object names are assumed as 'df1', 'df2', 'df3'

Upvotes: 1

Related Questions