cgxytf
cgxytf

Reputation: 431

Un-paired t-test using data within one column

I want to do an unpaired t-test to examine if values differ between sites in each type category.

So my question is, within types (AB or CD), do values (valueA or valueB) differ between sites (A or B)?

Here is an example of my data:

dat <- data.frame(
  "site" = c("A","B","B","A","A","B","B","A"), 
  "type" = c("AB","CD"), 
  "valueA" = c(13,-10,-5,18,-14,12,-17,19), 
  "valueB" = c(-3,20,15,-16,12,15,-11,14)
)
dat

site type valueA valueB
A   AB     13     -3
B   CD    -10     20
B   AB     -5     15
A   CD     18    -16
A   AB    -14     12
B   CD     12     15
B   AB    -17    -11
A   CD     19     14

I am trying to do four unpaired t-tests to examine:

  1. If valueA Type AB, differs between site A vs. site B
  2. If valueB Type AB, differs between site A vs. site B
  3. If valueA Type CD, differs between site A vs. site B
  4. If valueB Type CD, differs between site A vs. site B

In order to run the unpaired t-test, I believe I need to re-arrange my data so that type AB and type CB and site A and site B are each a column (instead of being within the type or site column).

EDIT:

Using the suggested code in the comments:

library(dplyr)
d %>% 
  group_by(site, type) %>% 
  summarise(pval = t.test(valueA, valueB)$p.value)

The output is this:

site  type   pval
A     AB    0.784
A     CD    0.417
B     AB    0.492
B     CD    0.365

To my understanding, this p-value here is giving me the difference between valueA and valueB.

I am looking for, for example: The difference between site A and site B of valueA in type CD.

So if I am thinking correctly, the output of the t-test should have a column for type, value A and value B. Then the p-values are for the differences between sites.

Similar to this:

type  valueA  valueB
AB     0.365   0.784
CD     0.492   0.417

Does this make sense?

Upvotes: 0

Views: 100

Answers (2)

AndS.
AndS.

Reputation: 8120

I think I see what you're asking for. See if this works for you:

library(tidyverse)

dat %>% 
  pivot_longer(cols = c(valueA, valueB), names_to = "name", values_to = "val") %>%
  split(.$site) %>%
  map(., ~rename(.x, !!sym(paste0(.x$site[[1]], "val")) := val) %>%
        select(-site)) %>%
  reduce(full_join, by = c("type", "name")) %>%
  group_by(type, name) %>%
  summarise(p.val = t.test(Aval, Bval)$p.value) %>%
  pivot_wider(id_cols = type, names_from = name, values_from = p.val)
#> # A tibble: 2 x 3
#> # Groups:   type [2]
#>   type  valueA valueB
#>   <fct>  <dbl>  <dbl>
#> 1 AB    0.284   0.785
#> 2 CD    0.0703  0.121

Here we go from wide to long, split the dataframe by site. Rename the values of interest to include the site, re-join the dataframe, and then run a grouped t.test by type and and site.

Upvotes: 1

akrun
akrun

Reputation: 887831

We can do a group_by 'site', 'type' and apply the t.test

library(dplyr)
out <- dat %>% 
         group_by(site, type) %>% 
         summarise(pval = t.test(valueA, valueB)$p.value)

By default, paired = FALSE in t.test

The output above can be reshaped to 'wide' format with pivot_wider

library(stringr)
library(tidyr)
out %>%
    ungroup %>%
    mutate(site = str_c('value', site)) %>% 
    pivot_wider(names_from = site, values_from = pval)
# A tibble: 2 x 3
#  type  valueA valueB
#  <fct>  <dbl>  <dbl>
#1 AB     0.784  0.492
#2 CD     0.417  0.365

If we want to compare the 'value' columns between 'AB' and 'CD'

dat %>% 
   group_by(site) %>% 
   summarise_at(vars(starts_with('value')), 
          ~ t.test(.[type == 'AB'], .[type == 'CD'])$p.value)
# A tibble: 2 x 3
#  site  valueA valueB
#  <fct>  <dbl>  <dbl>
#1 A      0.393  0.784
#2 B      0.464  0.439

Upvotes: 2

Related Questions