nilrem
nilrem

Reputation: 85

Get the result of a mathematical operation on multiple columns in a data.frame in R

I have the following data.frame. The "d" in the name of each variable stands for domain. Each domain has several items, for example domain 1 has item 1 (d1.1) and item 2 (d1.2); the second domain only has one item (d2.1). Each item has a "frequency", the frequency is represented by the letter "f" and a "gravity" represented by the letter "g", in such a way that the frequency of item 1 of domain 1 is d1.1f and the gravity of this item is d1.1g. Each row represents a subject.

df<-data.frame(d1.1f=c(0,1,1,1),d1.1g=c(0,0,1,1),d1.2f=c(1,0,1,1),d1.2g=c(1,0,0,1),d2.1f=c(0,2,3,1),d2.1g=c(3,1,0,1))
df

I would like to obtain, for each subject, the value of the column "tot_d?". The column "tot_d?" it is equal to the sum of the multiplication of gravity times the frequency of each item in each domain. For example for subject 1: tot_d1 = 0 * 0 + 1 * 1.

result<-data.frame(d1.1f=c(0,1,1,1),d1.1g=c(0,0,1,1),d1.2f=c(1,0,1,1),d1.2g=c(1,0,0,1),tot_d1=c(1,0,1,2),d2.1f=c(0,2,3,1),d2.1g=c(3,1,0,1),tot_d2=c(0,2,0,1))
result

I have tried dplyr but can't.

Thanks in advance

Upvotes: 0

Views: 141

Answers (3)

ktiu
ktiu

Reputation: 2636

Here is my attempt at a generalized version:

library(dplyr)

df %>%
  mutate(subject = seq(nrow(.))) %>%
  tidyr::pivot_longer(cols          = -subject,
                      names_pattern = "d([0-9]+)\\.([0-9]+)(f|g)",
                      names_to      = c("domain", "item", ".value")) %>%
  group_by(subject, domain) %>%
  summarise(tot = sum(f * g)) %>%
  tidyr::pivot_wider(id_cols      = subject,
                     names_from   = domain,
                     values_from  = tot,
                     names_prefix = "tot_d")

Result:

# A tibble: 4 x 3
# Groups:   subject [4]
  subject tot_d1 tot_d2
    <int>  <dbl>  <dbl>
1       1      1      0
2       2      0      2
3       3      1      0
4       4      2      1

(Data used:)

df <- structure(list(d1.1f = c(0, 1, 1, 1), d1.1g = c(0, 0, 1, 1), d1.2f = c(1, 0, 1, 1), d1.2g = c(1, 0, 0, 1), d2.1f = c(0, 2, 3, 1), d2.1g = c(3, 1, 0, 1)), class = "data.frame", row.names = c(NA, -4L))

Upvotes: 0

Marcelo Avila
Marcelo Avila

Reputation: 2374

The follow should do:

df %>% mutate(
  tot_d1 = d1.1f * d1.1g + d1.2f * d1.2g, 
  tot_d2 = d2.1f * d2.1g, 
)
#>   d1.1f d1.1g d1.2f d1.2g d2.1f d2.1g tot_d1 tot_d2
#> 1     0     0     1     1     0     3      1      0
#> 2     1     0     0     0     2     1      0      2
#> 3     1     1     1     0     3     0      1      0
#> 4     1     1     1     1     1     1      2      1

Or are you trying to write a function that generalises for any amount of domains?

Upvotes: 0

r2evans
r2evans

Reputation: 161110

I'll demonstrate a double-reshape method that honors domain and subject. It requires adding a "row number" (rn) column to align back with the original data. Because of the double-reshaping, it is resilient to the number of domains and subjects in your real data (in case you actually have 3 domains, more subjects, etc).

library(dplyr)
library(tidyr) # pivot_*

df <- mutate(df, rn = row_number())
df %>%
  pivot_longer(-rn, names_pattern = "d(.+)\\.(.+)([fg])", 
               names_to=c("domain", "subj", ".value")) %>%
  group_by(rn, domain) %>%
  summarize(tot = sum(f*g)) %>%
  pivot_wider(rn, names_from = "domain", names_prefix = "tot_d", 
              values_from = "tot") %>%
  left_join(df, ., by = "rn")
#   d1.1f d1.1g d1.2f d1.2g d2.1f d2.1g rn tot_d1 tot_d2
# 1     0     0     1     1     0     3  1      1      0
# 2     1     0     0     0     2     1  2      0      2
# 3     1     1     1     0     3     0  3      1      0
# 4     1     1     1     1     1     1  4      2      1

Upvotes: 3

Related Questions