Jack
Jack

Reputation: 857

How to use ifelse and group_by together?

I have this data below. I would like to impute the large value of income by 10 times the value of the median. This is done using the following code:

df$income_imputed = ifelse(df$income > (10* median(df$income,na.rm = T)), (10* median(df$income,na.rm = T)), df$income)

However, i would like to do that for each country and year separately and not to the whole dataset. I know group_by could be helpful with such tasks, but i am not sure how to include both functions together.

    country  year   income
     <dbl> <dbl>    <dbl>
 1       1  1999     5000
 2       1  1999     5000
 3       1  1999 10000000
 4       1  1999     3000
 5       1  2000     4000
 6       1  2000     4000
 7       1  2000 20000000
 8       1  2000     4000
 9       2  1999    10000
10       2  1999    10000
11       2  1999 30000000
12       2  1999     4000
13       2  2000    12000
14       2  2000    12000
15       2  2000 40000000


df= structure(list(country = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 
2, 2, 2), year = c(1999, 1999, 1999, 1999, 2000, 2000, 2000, 
2000, 1999, 1999, 1999, 1999, 2000, 2000, 2000), income = c(5000, 
5000, 1e+07, 3000, 4000, 4000, 2e+07, 4000, 10000, 10000, 3e+07, 
4000, 12000, 12000, 4e+07), income2 = c(5000, 5000, 1e+05, 3000, 
4000, 4000, 1e+05, 4000, 10000, 10000, 1e+05, 4000, 12000, 12000, 
1e+05)), row.names = c(NA, -15L), class = c("tbl_df", "tbl", 
"data.frame"))

Upvotes: 0

Views: 67

Answers (3)

Vivek-Prakash-US
Vivek-Prakash-US

Reputation: 51

Putting group_by() just before the calculation should do the trick

library(dplyr)

df <- df %>% 
      group_by(country, year) %>% 
      mutate(income2 = ifelse(income > 10 * median(income, na.rm =  TRUE), 
                              yes = 10 * median(income, na.rm = TRUE), 
                              no = income))

Upvotes: 1

cgvoller
cgvoller

Reputation: 879

df  
   country  year   income income2
         <dbl> <dbl>    <dbl>   <dbl>
     1       1  1999     5000    5000
     2       1  1999     5000    5000
     3       1  1999 10000000  100000
     4       1  1999     3000    3000
     5       1  2000     4000    4000
     6       1  2000     4000    4000
     7       1  2000 20000000  100000
     8       1  2000     4000    4000
     9       2  1999    10000   10000
    10       2  1999    10000   10000
    11       2  1999 30000000  100000
    12       2  1999     4000    4000
    13       2  2000    12000   12000
    14       2  2000    12000   12000
    15       2  2000 40000000  100000

Using dplyr:

  df %>% group_by(country,year) %>% mutate(income_imputed =ifelse(income > (10* median(income,na.rm = T)), (10* median(income,na.rm = T)),income))

Output:

 country  year   income income2 income_imputed
     <dbl> <dbl>    <dbl>   <dbl>          <dbl>
 1       1  1999     5000    5000           5000
 2       1  1999     5000    5000           5000
 3       1  1999 10000000  100000          50000
 4       1  1999     3000    3000           3000
 5       1  2000     4000    4000           4000
 6       1  2000     4000    4000           4000
 7       1  2000 20000000  100000          40000
 8       1  2000     4000    4000           4000
 9       2  1999    10000   10000          10000
10       2  1999    10000   10000          10000
11       2  1999 30000000  100000         100000
12       2  1999     4000    4000           4000
13       2  2000    12000   12000          12000
14       2  2000    12000   12000          12000
15       2  2000 40000000  100000         120000

Upvotes: 1

DataM
DataM

Reputation: 351

I believe this is what you expect :

my_df <- my_df %>% group_by(country, year) %>% mutate(income_imputed = ifelse(income > (10* median(income,na.rm = T)), (10* median(income,na.rm = T)), income))

Upvotes: 1

Related Questions