JuanJMV
JuanJMV

Reputation: 145

Calculate variables within family in a long format dataset

I have a data set like this one:

df <- data.frame(FAMID = c(1,1,2,2,3,4,5,5,6),
                 IID =   c("A","B","A","B","A","B","A","B","B"),
                 Value = c(3,6,3,5,6,7,0,4,4))
#   FAMID IID Value
# 1     1   A     3
# 2     1   B     6
# 3     2   A     3
# 4     2   B     5
# 5     3   A     6
# 6     4   B     7
# 7     5   A     0
# 8     5   B     4
# 9     6   B     4

And I need to calculate some variables (the difference score for the variable Value and the mean, for each pair of members). So, I would need an output like this one

df2 <- data.frame(
  FAMID2 = c(1,1,2,2,3,4,5,5,6),
  IID2 = c("A","B","A","B","A","B","A","B","B"),
  Value2 = c(3,6,3,5,6,7,0,4,4),
  DiffValue = c(-3, 3,-2,2,NA, NA, -4, 4, NA),
  Mean = c(4.5,4.5,4,4,NA,NA,2,2,NA))
#   FAMID2 IID2 Value2 DiffValue Mean
# 1      1    A      3        -3  4.5
# 2      1    B      6         3  4.5
# 3      2    A      3        -2  4.0
# 4      2    B      5         2  4.0
# 5      3    A      6        NA   NA
# 6      4    B      7        NA   NA
# 7      5    A      0        -4  2.0
# 8      5    B      4         4  2.0
# 9      6    B      4        NA   NA

Is there any way to do it in a long format?

Thank you so much in advance.

Upvotes: 0

Views: 68

Answers (2)

Friede
Friede

Reputation: 8244

Using by from base:

by(df, ~FAMID, \(x) { if(nrow(x)==1L) { x$Diff=NA; x$Mean=NA } else { 
  d=x$Value[1L]-x$Value[2L]; x$Diff=c(d, -d); x$Mean=mean(x$Value) }; x[-1L]}) |> 
  array2DF()

    FAMID IID Value Diff Mean
1       1   A     3   -3  4.5
1.1     1   B     6    3  4.5
2       2   A     3   -2  4.0
2.1     2   B     5    2  4.0
3       3   A     6   NA   NA
4       4   B     7   NA   NA
5       5   A     0   -4  2.0
5.1     5   B     4    4  2.0
6       6   B     4   NA   NA

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389325

You may try this :

library(dplyr)

df %>%
  mutate(DiffValue = if (n() == 1) NA else {
    a = Value[1] - Value[2]
    c(a, -a)
    },
    Mean = if (n() == 1) NA else mean(Value), .by = FAMID)

#  FAMID IID Value DiffValue Mean
#1     1   A     3        -3  4.5
#2     1   B     6         3  4.5
#3     2   A     3        -2  4.0
#4     2   B     5         2  4.0
#5     3   A     6        NA   NA
#6     4   B     7        NA   NA
#7     5   A     0        -4  2.0
#8     5   B     4         4  2.0
#9     6   B     4        NA   NA

Upvotes: 3

Related Questions