Reputation: 85
I have the following data.frame. The "d" in the name of each variable stands for domain. Each domain has several items, for example domain 1 has item 1 (d1.1) and item 2 (d1.2); the second domain only has one item (d2.1). Each item has a "frequency", the frequency is represented by the letter "f" and a "gravity" represented by the letter "g", in such a way that the frequency of item 1 of domain 1 is d1.1f and the gravity of this item is d1.1g. Each row represents a subject.
df<-data.frame(d1.1f=c(0,1,1,1),d1.1g=c(0,0,1,1),d1.2f=c(1,0,1,1),d1.2g=c(1,0,0,1),d2.1f=c(0,2,3,1),d2.1g=c(3,1,0,1))
df
I would like to obtain, for each subject, the value of the column "tot_d?". The column "tot_d?" it is equal to the sum of the multiplication of gravity times the frequency of each item in each domain. For example for subject 1: tot_d1 = 0 * 0 + 1 * 1.
result<-data.frame(d1.1f=c(0,1,1,1),d1.1g=c(0,0,1,1),d1.2f=c(1,0,1,1),d1.2g=c(1,0,0,1),tot_d1=c(1,0,1,2),d2.1f=c(0,2,3,1),d2.1g=c(3,1,0,1),tot_d2=c(0,2,0,1))
result
I have tried dplyr but can't.
Thanks in advance
Upvotes: 0
Views: 141
Reputation: 2636
Here is my attempt at a generalized version:
library(dplyr)
df %>%
mutate(subject = seq(nrow(.))) %>%
tidyr::pivot_longer(cols = -subject,
names_pattern = "d([0-9]+)\\.([0-9]+)(f|g)",
names_to = c("domain", "item", ".value")) %>%
group_by(subject, domain) %>%
summarise(tot = sum(f * g)) %>%
tidyr::pivot_wider(id_cols = subject,
names_from = domain,
values_from = tot,
names_prefix = "tot_d")
Result:
# A tibble: 4 x 3
# Groups: subject [4]
subject tot_d1 tot_d2
<int> <dbl> <dbl>
1 1 1 0
2 2 0 2
3 3 1 0
4 4 2 1
(Data used:)
df <- structure(list(d1.1f = c(0, 1, 1, 1), d1.1g = c(0, 0, 1, 1), d1.2f = c(1, 0, 1, 1), d1.2g = c(1, 0, 0, 1), d2.1f = c(0, 2, 3, 1), d2.1g = c(3, 1, 0, 1)), class = "data.frame", row.names = c(NA, -4L))
Upvotes: 0
Reputation: 2374
The follow should do:
df %>% mutate(
tot_d1 = d1.1f * d1.1g + d1.2f * d1.2g,
tot_d2 = d2.1f * d2.1g,
)
#> d1.1f d1.1g d1.2f d1.2g d2.1f d2.1g tot_d1 tot_d2
#> 1 0 0 1 1 0 3 1 0
#> 2 1 0 0 0 2 1 0 2
#> 3 1 1 1 0 3 0 1 0
#> 4 1 1 1 1 1 1 2 1
Or are you trying to write a function that generalises for any amount of domains?
Upvotes: 0
Reputation: 161110
I'll demonstrate a double-reshape method that honors domain and subject. It requires adding a "row number" (rn
) column to align back with the original data. Because of the double-reshaping, it is resilient to the number of domains and subjects in your real data (in case you actually have 3 domains, more subjects, etc).
library(dplyr)
library(tidyr) # pivot_*
df <- mutate(df, rn = row_number())
df %>%
pivot_longer(-rn, names_pattern = "d(.+)\\.(.+)([fg])",
names_to=c("domain", "subj", ".value")) %>%
group_by(rn, domain) %>%
summarize(tot = sum(f*g)) %>%
pivot_wider(rn, names_from = "domain", names_prefix = "tot_d",
values_from = "tot") %>%
left_join(df, ., by = "rn")
# d1.1f d1.1g d1.2f d1.2g d2.1f d2.1g rn tot_d1 tot_d2
# 1 0 0 1 1 0 3 1 1 0
# 2 1 0 0 0 2 1 2 0 2
# 3 1 1 1 0 3 0 3 1 0
# 4 1 1 1 1 1 1 4 2 1
Upvotes: 3