Calculate Ratios in columns, based on other columns

Question

I am facing an thinking & programming problem. See below my question, I have no clue what a proper approach is (played with DPLYR's group_by, but without results). Many thanks in advance for trying helping me out here!

I have a data set like this:

Numbers   Area      Cluster  
1         A          1            
0.8       A          1
0.78      A          1
0.7       B          1
0.4       A          2
0         C          1

I want to calculate two new columns:

Show the % of Area's occurring in a specific cluster (Column_Example_1)
Per Cluster, a new index of the column numbers (in a range from 1 - 0) (Column_example_2). The new ratio should be based on the column Numbers #note: in the example it is just an example, it could also done differently, but we I want to make sure that the column Numbers is leading)

The result should be like this:

Numbers   Area      Cluster  Example_1                             Example_2 
1         A          1          60%  #5x cluster 1, and 3x Area A)   1
0.8       A          1          60%                                  0.8  
0.78      A          1          60%                                  0.78
0.7       B          1          20%                                  0.7 
0.4       A          2         100%                                  1
0         C          1          20%                                  0

erocoar · Accepted Answer

Since you want to keep all rows, you can calculate the relative frequencies as follows:

library(tidyverse)
df <- data.frame(numbers = c(1, .8, .78, .7, .4, 0),
                 area = c("A", "A", "A", "B", "A", "C"),
                 cluster = c(1, 1, 1, 1, 2, 1))

df %>% 
  group_by(cluster) %>%
  mutate(example_1 = n()) %>%
  group_by(area, cluster) %>%
  mutate(example_1 = n() / example_1)

# A tibble: 6 x 4
# Groups:   area, cluster [4]
  numbers area  cluster example_1
             
1    1    A           1       0.6
2    0.8  A           1       0.6
3    0.78 A           1       0.6
4    0.7  B           1       0.2
5    0.4  A           2       1  
6    0    C           1       0.2

Calculate Ratios in columns, based on other columns

Answers (2)

Related Questions