Alzbeta Ivanovicova
Alzbeta Ivanovicova

Reputation: 35

Rename/recode variable value in R based on condition using dplyr

I have a dataset dataExtended with variable CountryOther and n which is a count of wines in that particular country. CountryOther is character type and n is integer. What I want to do, is to rename values in CountryOther to Other in case the n <=20. I would like to do it with dyplr package and I am not sure how to do it and if to use only mutate or mutate_at.

As long as I wasn't able to do wrote the condition as stated above, I tried to do it manually as follows but it didn't work:

dataExtended$CountryOther <- dataExtended$Country
dataExtended %>% 
  mutate(CountryOther = recode(CountryOther,
                               China = "Other", 
                               Mexico = "Other", 
                               Slovakia = "Other",
                               Bulgaria = "Other", 
                               Canada = "Other", 
                               Croatia = "Other", 
                               Uruguay = "Other", 
                               Georgia = "Other", 
                               Turkey = "Other", 
                               Moldova = "Other", 
                               Slovenia = "Other", 
                               Hungary = "Other", 
                               Switzerland = "Other", 
                               Greece = "Other", 
                               Israel = "Other", 
                               Lebanon= "Other"))

Upvotes: 1

Views: 511

Answers (2)

Alzbeta Ivanovicova
Alzbeta Ivanovicova

Reputation: 35

in the end I created this code which works:

#New table with wine count
wineCount <- data %>% count(Country)
#Joining two tables together
dataExtended <- inner_join(wineCount, data, by = "Country")
# Creating new variable CountryOther
dataExtended$CountryOther <- dataExtended$Country
# Renaming count from n to WineCount
dataExtended <- rename(dataExtended, WineCount = n)
# Replacement of countries with WineCount<=20 to Other
dataExtended <- dataExtended %>% 
  mutate(CountryOther = ifelse(WineCount<=20, "Other", CountryOther))
# Final check
unique(dataExtended$CountryOther)

The problem was I needed to store changes into the dataframe, which I didn't do before (as you can see in my last comment):

dataExtended <- rename(dataExtended, WineCount = n)

and

dataExtended <- dataExtended %>% 
  mutate(CountryOther = ifelse(WineCount<=20, "Other", CountryOther))

I also tested your code and it works as well and additionally it looks neater. So thank you very much for your help.

Upvotes: 1

Martin Gal
Martin Gal

Reputation: 16978

Using the Red.csv from your link imported with readr::read_csv() creates a data.frame / tibble

#> data
# A tibble: 8,666 × 8
   Name                               Country     Region       Winery                   Rating NumberOf…¹ Price Year 
   <chr>                              <chr>       <chr>        <chr>                     <dbl>      <dbl> <dbl> <chr>
 1 Pomerol 2011                       France      Pomerol      Château La Providence       4.2        100 95    2011 
 2 Lirac 2017                         France      Lirac        Château Mont-Redon          4.3        100 15.5  2017 
 3 Erta e China Rosso di Toscana 2015 Italy       Toscana      Renzo Masi                  3.9        100  7.45 2015 
 4 Bardolino 2019                     Italy       Bardolino    Cavalchina                  3.5        100  8.72 2019 
 5 Ried Scheibner Pinot Noir 2016     Austria     Carnuntum    Markowitsch                 3.9        100 29.2  2016 
 6 Gigondas (Nobles Terrasses) 2017   France      Gigondas     Vieux Clocher               3.7        100 19.9  2017 
 7 Marion's Vineyard Pinot Noir 2016  New Zealand Wairarapa    Schubert                    4          100 43.9  2016 
 8 Red Blend 2014                     Chile       Itata Valley Viña La Causa               3.9        100 17.5  2014 
 9 Chianti 2015                       Italy       Chianti      Castello Montaùto           3.6        100 10.8  2015 
10 Tradition 2014                     France      Minervois    Domaine des Aires Hautes    3.5        100  6.9  2014 
# … with 8,656 more rows, and abbreviated variable name ¹​NumberOfRatings

Now with dplyrs help

library(dplyr)

data %>% 
  add_count(Country, name = "WineCount") %>% 
  mutate(CountryOther = ifelse(WineCount <= 20, "Other", Country))

we get

# A tibble: 8,666 × 10
   Name                               Country     Region       Winery      Rating Numbe…¹ Price Year  WineC…² Count…³
   <chr>                              <chr>       <chr>        <chr>        <dbl>   <dbl> <dbl> <chr>   <int> <chr>  
 1 Pomerol 2011                       France      Pomerol      Château La…    4.2     100 95    2011     2256 France 
 2 Lirac 2017                         France      Lirac        Château Mo…    4.3     100 15.5  2017     2256 France 
 3 Erta e China Rosso di Toscana 2015 Italy       Toscana      Renzo Masi     3.9     100  7.45 2015     2650 Italy  
 4 Bardolino 2019                     Italy       Bardolino    Cavalchina     3.5     100  8.72 2019     2650 Italy  
 5 Ried Scheibner Pinot Noir 2016     Austria     Carnuntum    Markowitsch    3.9     100 29.2  2016      220 Austria
 6 Gigondas (Nobles Terrasses) 2017   France      Gigondas     Vieux Cloc…    3.7     100 19.9  2017     2256 France 
 7 Marion's Vineyard Pinot Noir 2016  New Zealand Wairarapa    Schubert       4       100 43.9  2016       63 New Ze…
 8 Red Blend 2014                     Chile       Itata Valley Viña La Ca…    3.9     100 17.5  2014      326 Chile  
 9 Chianti 2015                       Italy       Chianti      Castello M…    3.6     100 10.8  2015     2650 Italy  
10 Tradition 2014                     France      Minervois    Domaine de…    3.5     100  6.9  2014     2256 France
# … with 8,656 more rows, and abbreviated variable names ¹​NumberOfRatings, ²​WineCount, ³​CountryOther

We can filter for WineCount <= 30:

# A tibble: 125 × 10
   Name                                  Country     Region         Winery Rating Numbe…¹ Price Year  WineC…² Count…³
   <chr>                                 <chr>       <chr>          <chr>   <dbl>   <dbl> <dbl> <chr>   <int> <chr>  
 1 Steiner 2013                          Hungary     Sopron         Wenin…    3.7     100 24.5  2013        9 Other  
 2 Viile Metamorfosis Merlot 2015        Romania     Dealu Mare     Vitis…    3.5     102  7.5  2015       23 Romania
 3 Halkidiki Limnio - Merlot 2013        Greece      Chalkidiki     Tsant…    3.2     105 12.5  2013       13 Other  
 4 Cabernet Sauvignon 2013               Mexico      Valle de Guad… L. A.…    3.4    1066  8.65 2013        1 Other  
 5 Driopi Classic Agiorgitiko Nemea 2017 Greece      Nemea          Κτημα…    3.7     107 11.5  2017       13 Other  
 6 Malbec de Purcari 2018                Moldova     South Eastern  Châte…    4.1     107 12.0  2018        8 Other  
 7 Cabernet Sauvignon de Purcari 2017    Moldova     South Eastern  Châte…    4.1    1082 13.0  2017        8 Other  
 8 Cabernet Sauvignon 2016               Romania     Samburesti     Caste…    3.3     112  7.9  2016       23 Romania
 9 Aigle Les Murailles Rouge 2015        Switzerland Aigle          Henri…    3.7     112 23.2  2015       12 Other  
10 Γουμένισσα (Goumenissa) 2015          Greece      Goumenissa     Chatz…    3.7     115 20    2015       13 Other

to check the desired output: There are several rows filled with "Other" in column CountryOther.

Upvotes: 1

Related Questions