Max H.
Max H.

Reputation: 29

Creating a new column in R based off certain criteria

so I have a dataset, df, that looks like this

Customer Spending($)
abc      10
abc      11
123      5
123      10
456      9
456      2

I want to create a new column, lets name this <$20 that would return the name of the customer whose sum of spending is less than $20 and return a null value if the customer spent over $20.

I would want the result to appear like this

Customer Spending($) <20
abc      10          null
abc      11          null
123      5           123
123      10          123
456      9           456
456      2           456

Any idea on how to go about this? I feel like this should be basic, but I am struggling. Thanks!

Upvotes: 0

Views: 35

Answers (2)

Karthik S
Karthik S

Reputation: 11546

Does this work:

> library(readr)
> library(dplyr)
> df %>% group_by(Customer) %>% mutate(total_spending = sum(parse_number(Spending))) %>% 
+   mutate(less_than_20 = case_when(total_spending < 20 ~ Customer, TRUE ~ 'null')) %>% select(-total_spending)
# A tibble: 6 x 3
# Groups:   Customer [3]
  Customer Spending less_than_20
  <chr>    <chr>    <chr>       
1 abc      $10      null        
2 abc      $11      null        
3 123      $5       123         
4 123      $10      123         
5 456      $9       456         
6 456      $2       456         
> 

Upvotes: 0

Duck
Duck

Reputation: 39613

Try this. You can directly embed the comparison, previously cleaning the Spending variable, and obtain the desired values. Here the code:

library(dplyr)
#Code
newdf <- df %>% group_by(Customer) %>%
  mutate(Var=ifelse(sum(as.numeric(gsub('$','',Spending,fixed = T)))>20,'null',Customer))

Output:

# A tibble: 6 x 3
# Groups:   Customer [3]
  Customer Spending Var  
  <chr>    <chr>    <chr>
1 abc      $10      null 
2 abc      $11      null 
3 123      $5       123  
4 123      $10      123  
5 456      $9       456  
6 456      $2       456  

Some data used:

#Data
df <- structure(list(Customer = c("abc", "abc", "123", "123", "456", 
"456"), Spending = c("$10", "$11", "$5", "$10", "$9", "$2")), class = "data.frame", row.names = c(NA, 
-6L))

Upvotes: 1

Related Questions