Reputation: 4100
I have a sample df like below:
df_test<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2","Group2","Group1"),
"Sub_group_name"=c("A","A","B","C","D","E","C"),
"Total%"=c(35,26,10,9,5,11,13))
The original df is quite big and points to remember about this df:
Group1
all subgroups like A, B, C
etc. will add upto 100 & so for "Group2". Subgroups for both Group1 and Group2 will be more or less sameAsk:
I need to create a column called Category
which lets works on range of Total%
on an Group.Name
level. The conditions for creating a new column are:
For every Group.Name
whereever Total%
is highest, the category column is whatever the Sub_group_name
name is.
For every Group.Name
and Total%
between 10-30, the category column is "New_Group1".
For every Group.Name
and Total%
less than 10, the category column is "New_Group2".
Expected Output:
df_output<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2","Group2","Group1"),
"Sub_group_name"=c("A","A","B","C","D","E","C"),
"Total%"=c(35,26,10,9,5,11,13),
"category"=c("A","A","New_Group1","New_Group1","New_Group2","New_Group1","New_Group1"))
Upvotes: 3
Views: 67
Reputation: 887531
We can do this with cut
to create the labels
with the corresponding breaks
and then replace the 'Total.' that is the highest in each 'Group.Name' with the correspoding 'Sub_group_name'
library(dplyr)
df_test %>%
group_by(Group.Name) %>%
mutate(category = as.character(cut(`Total%`, breaks = c(-Inf,10, 30, Inf),
labels = c("New_Group2", "New_Group1", "Other"), right = FALSE)),
category = case_when(`Total%` == max(`Total%`) ~
Sub_group_name,
TRUE ~ category))
# A tibble: 7 x 4
# Groups: Group.Name [2]
# Group.Name Sub_group_name `Total%` category
# <chr> <chr> <dbl> <chr>
#1 Group1 A 35 A
#2 Group2 A 26 A
#3 Group1 B 10 New_Group1
#4 Group2 C 9 New_Group2
#5 Group2 D 5 New_Group2
#6 Group2 E 11 New_Group1
#7 Group1 C 13 New_Group1
df_test<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2",
"Group2","Group1"),
"Sub_group_name"=c("A","A","B","C","D","E","C"),
"Total%"=c(35,26,10,9,5,11,13), stringsAsFactors = FALSE,
check.names = FALSE)
Upvotes: 1