Reputation: 29
so I have a dataset, df, that looks like this
Customer Spending($)
abc 10
abc 11
123 5
123 10
456 9
456 2
I want to create a new column, lets name this <$20 that would return the name of the customer whose sum of spending is less than $20 and return a null value if the customer spent over $20.
I would want the result to appear like this
Customer Spending($) <20
abc 10 null
abc 11 null
123 5 123
123 10 123
456 9 456
456 2 456
Any idea on how to go about this? I feel like this should be basic, but I am struggling. Thanks!
Upvotes: 0
Views: 35
Reputation: 11546
Does this work:
> library(readr)
> library(dplyr)
> df %>% group_by(Customer) %>% mutate(total_spending = sum(parse_number(Spending))) %>%
+ mutate(less_than_20 = case_when(total_spending < 20 ~ Customer, TRUE ~ 'null')) %>% select(-total_spending)
# A tibble: 6 x 3
# Groups: Customer [3]
Customer Spending less_than_20
<chr> <chr> <chr>
1 abc $10 null
2 abc $11 null
3 123 $5 123
4 123 $10 123
5 456 $9 456
6 456 $2 456
>
Upvotes: 0
Reputation: 39613
Try this. You can directly embed the comparison, previously cleaning the Spending
variable, and obtain the desired values. Here the code:
library(dplyr)
#Code
newdf <- df %>% group_by(Customer) %>%
mutate(Var=ifelse(sum(as.numeric(gsub('$','',Spending,fixed = T)))>20,'null',Customer))
Output:
# A tibble: 6 x 3
# Groups: Customer [3]
Customer Spending Var
<chr> <chr> <chr>
1 abc $10 null
2 abc $11 null
3 123 $5 123
4 123 $10 123
5 456 $9 456
6 456 $2 456
Some data used:
#Data
df <- structure(list(Customer = c("abc", "abc", "123", "123", "456",
"456"), Spending = c("$10", "$11", "$5", "$10", "$9", "$2")), class = "data.frame", row.names = c(NA,
-6L))
Upvotes: 1