user330
user330

Reputation: 1270

How to calculate differences for this data frame in R

I have multiple columns, but here is a small of my data:

df<- read.table(text = "Var1    M1  N1  Var2    M2  N2  Var3    M3  N3  Var4    M4  N4  Var5    M5  N5  Var6    M6  N6
 A11    1   0   A12 0.3 0.5 A21 1   0.3 A22 0.6 0   A31 1   0.2 A32 0   1
             
", header = TRUE)

I want to calculate differences in M and N. So, Ma-Mb and Na-Nb to get the following table

Var diffM   diffN
A1  0.7 -0.5
A2  0.4  0.3
A3  1   -0.8

Upvotes: 2

Views: 48

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101678

A data.table option

rbindlist(
  split.default(
    setDT(df),
    gsub("\\D", "", names(df))
  ),
  use.names=FALSE
)[
  ,
  setNames(
    lapply(.SD, function(x) diff(rev(x))),
    paste0("diff", gsub("\\d", "", names(.SD)))
  ),
  by = .(Var = gsub("\\d$", "", Var1))
]

gives

   Var diffM diffN
1:  A1   0.7  -0.5
2:  A2   0.4   0.3
3:  A3   1.0  -0.8

Upvotes: 1

akrun
akrun

Reputation: 887213

We can reshape to 'long' format first and then do a group by diff across the 'M', 'N' columns

library(dplyr)
library(tidyr)
library(stringr)
df %>% 
   pivot_longer(cols = everything(), names_to = c(".value", 'grp'), 
       names_sep = "(?<=[^0-9])(?=[0-9])") %>% 
   group_by(Var = str_sub(Var, 1, 2)) %>% 
   summarise(across(c(M, N), ~ -diff(.), .names = "diff{.col}"), .groups = 'drop')
# A tibble: 3 x 3
#  Var   diffM diffN
#  <chr> <dbl> <dbl>
#1 A1     0.7   -0.5
#2 A2     0.4    0.3
#3 A3     1     -0.8

If we want the sum for 'N', use that in summarise

df %>% 
   pivot_longer(cols = everything(), names_to = c(".value", 'grp'), 
       names_sep = "(?<=[^0-9])(?=[0-9])") %>% 
   group_by(Var = str_sub(Var, 1, 2)) %>% 
   summarise(across(c(M, N), ~ -diff(.), .names = "diff{.col}"), 
               sumN = sum(N), .groups = 'drop')
# A tibble: 3 x 4
#  Var   diffM diffN  sumN
#  <chr> <dbl> <dbl> <dbl>
#1 A1      0.7  -0.5   0.5
#2 A2      0.4   0.3   0.3
#3 A3      1    -0.8   1.2

Upvotes: 1

Related Questions