Reputation: 2290
I am working on defining a temporal network with Pajek software.
Below the data and code that I am using:
library(data.table)
Aggregated <- fread("
act1_1 act1_2 act1_3 act1_4 act1_5
2 1 3 2 6
1 2 2 1 1
1 4 2 2 3
")
cols <- names(Aggregated)
n <- length(cols)
vi <- CJ(rn = 1:nrow(Aggregated), len = 2:5, start = 1:n)[
, end := start + len - 1L][
end <= n]
dl <- melt(setDT(Aggregated)[, rn := .I], id.vars = "rn", variable.name = "pos",
variable.factor = TRUE)[
, pos := as.integer(pos)][]
result <- dl[vi, on = .(rn, pos >= start, pos <= end),
.(rn, values = toString(value), position = toString(cols[x.pos])),
by = .EACHI, nomatch = 0L][
, .(freq = .N), by = .(values, position)]
result[order(nchar(values), values)]
Below the outcome:
values position freq
1: 1, 1 act1_4, act1_5 1
2: 1, 2 act1_1, act1_2 1
3: 1, 3 act1_2, act1_3 1
4: 1, 4 act1_1, act1_2 1
5: 2, 1 act1_1, act1_2 1
6: 2, 1 act1_3, act1_4 1
7: 2, 2 act1_2, act1_3 1
8: 2, 2 act1_3, act1_4 1
9: 2, 3 act1_4, act1_5 1
10: 2, 6 act1_4, act1_5 1
11: 3, 2 act1_3, act1_4 1
12: 4, 2 act1_2, act1_3 1
13: 1, 2, 2 act1_1, act1_2, act1_3 1
14: 1, 3, 2 act1_2, act1_3, act1_4 1
15: 1, 4, 2 act1_1, act1_2, act1_3 1
16: 2, 1, 1 act1_3, act1_4, act1_5 1
17: 2, 1, 3 act1_1, act1_2, act1_3 1
18: 2, 2, 1 act1_2, act1_3, act1_4 1
19: 2, 2, 3 act1_3, act1_4, act1_5 1
20: 3, 2, 6 act1_3, act1_4, act1_5 1
21: 4, 2, 2 act1_2, act1_3, act1_4 1
22: 1, 2, 2, 1 act1_1, act1_2, act1_3, act1_4 1
23: 1, 3, 2, 6 act1_2, act1_3, act1_4, act1_5 1
24: 1, 4, 2, 2 act1_1, act1_2, act1_3, act1_4 1
25: 2, 1, 3, 2 act1_1, act1_2, act1_3, act1_4 1
26: 2, 2, 1, 1 act1_2, act1_3, act1_4, act1_5 1
27: 4, 2, 2, 3 act1_2, act1_3, act1_4, act1_5 1
28: 1, 2, 2, 1, 1 act1_1, act1_2, act1_3, act1_4, act1_5 1
29: 1, 4, 2, 2, 3 act1_1, act1_2, act1_3, act1_4, act1_5 1
30: 2, 1, 3, 2, 6 act1_1, act1_2, act1_3, act1_4, act1_5 1
My question how to create another column that count the frequencies with the same values such as:
Sum of freq
5: 2, 1 act1_1, act1_2 1 2
6: 2, 1 act1_3, act1_4 1
7: 2, 2 act1_2, act1_3 1 2
8: 2, 2 act1_3, act1_4 1
Upvotes: 0
Views: 91
Reputation: 9505
Maybe this could be helpful:
library(data.table)
#... this is the last row of your code renamed
df <- result[order(nchar(values), values)]
df[,summed:=sum(freq), by=values]
df
values position freq summed
1: 1, 1 act1_4, act1_5 1 1
2: 1, 2 act1_1, act1_2 1 1
3: 1, 3 act1_2, act1_3 1 1
4: 1, 4 act1_1, act1_2 1 1
5: 2, 1 act1_1, act1_2 1 2
6: 2, 1 act1_3, act1_4 1 2
7: 2, 2 act1_2, act1_3 1 2
8: 2, 2 act1_3, act1_4 1 2
9: 2, 3 act1_4, act1_5 1 1
10: 2, 6 act1_4, act1_5 1 1
11: 3, 2 act1_3, act1_4 1 1
...
EDIT: You can try this:
df$sm <- ifelse(duplicated(df$values) == T, NA, df$summed)
df
values position freq summed sm
1: 1, 1 act1_4, act1_5 1 1 1
2: 1, 2 act1_1, act1_2 1 1 1
3: 1, 3 act1_2, act1_3 1 1 1
4: 1, 4 act1_1, act1_2 1 1 1
5: 2, 1 act1_1, act1_2 1 2 2
6: 2, 1 act1_3, act1_4 1 2 NA
7: 2, 2 act1_2, act1_3 1 2 2
8: 2, 2 act1_3, act1_4 1 2 NA
9: 2, 3 act1_4, act1_5 1 1 1
10: 2, 6 act1_4, act1_5 1 1 1
Upvotes: 1
Reputation: 97
It might not be pretty and could be a bit tedious, but maybe you could use
sum_of_frequencies <- c(sum(df$freq[df$values == "4,4"]),
sum(df$freq[df$values == "12,4"]),
...)
Of course, you would have to do this for every value you have, and depending on how many there are this could take a while. Then, if you want to see it
values <- c("4,4", "12,4" ...)
see_sum_of_freq <- data.frame(sum_of_frequencies, values)
which, again, depending on how many you have, could take a while
Upvotes: 1