Reputation: 1352
I am facing an thinking & programming problem. See below my question, I have no clue what a proper approach is (played with DPLYR's group_by, but without results). Many thanks in advance for trying helping me out here!
I have a data set like this:
Numbers Area Cluster
1 A 1
0.8 A 1
0.78 A 1
0.7 B 1
0.4 A 2
0 C 1
I want to calculate two new columns:
The result should be like this:
Numbers Area Cluster Example_1 Example_2
1 A 1 60% #5x cluster 1, and 3x Area A) 1
0.8 A 1 60% 0.8
0.78 A 1 60% 0.78
0.7 B 1 20% 0.7
0.4 A 2 100% 1
0 C 1 20% 0
Upvotes: 0
Views: 390
Reputation: 1718
You can also do with data.table
:
library(magrittr)
library(data.table)
df <- data.table(Numbers = c(1, .8, .78, .7, .4, 0),
Area = c(rep("A", 3), "B", "A", "C"),
Cluster = c(rep(1, 4), 2, 1))
df[, N := .N, by = c("Cluster")] %>%
.[, Example_1 := .N/N, by = c("Cluster", "Area")] %>%
.[, `:=`(N = NULL, Example_2 = Numbers)]
Output:
> df
Numbers Area Cluster Example_1 Example_2
1: 1.00 A 1 0.6 1.00
2: 0.80 A 1 0.6 0.80
3: 0.78 A 1 0.6 0.78
4: 0.70 B 1 0.2 0.70
5: 0.40 A 2 1.0 0.40
6: 0.00 C 1 0.2 0.00
Upvotes: 1
Reputation: 5893
Since you want to keep all rows, you can calculate the relative frequencies as follows:
library(tidyverse)
df <- data.frame(numbers = c(1, .8, .78, .7, .4, 0),
area = c("A", "A", "A", "B", "A", "C"),
cluster = c(1, 1, 1, 1, 2, 1))
df %>%
group_by(cluster) %>%
mutate(example_1 = n()) %>%
group_by(area, cluster) %>%
mutate(example_1 = n() / example_1)
# A tibble: 6 x 4
# Groups: area, cluster [4]
numbers area cluster example_1
<dbl> <fct> <dbl> <dbl>
1 1 A 1 0.6
2 0.8 A 1 0.6
3 0.78 A 1 0.6
4 0.7 B 1 0.2
5 0.4 A 2 1
6 0 C 1 0.2
Upvotes: 2