Reputation: 35
I have a dataset dataExtended
with variable CountryOther
and n
which is a count of wines in that particular country. CountryOther
is character type and n
is integer. What I want to do, is to rename values in CountryOther
to Other in case the n <=20
. I would like to do it with dyplr package and I am not sure how to do it and if to use only mutate
or mutate_at
.
As long as I wasn't able to do wrote the condition as stated above, I tried to do it manually as follows but it didn't work:
dataExtended$CountryOther <- dataExtended$Country
dataExtended %>%
mutate(CountryOther = recode(CountryOther,
China = "Other",
Mexico = "Other",
Slovakia = "Other",
Bulgaria = "Other",
Canada = "Other",
Croatia = "Other",
Uruguay = "Other",
Georgia = "Other",
Turkey = "Other",
Moldova = "Other",
Slovenia = "Other",
Hungary = "Other",
Switzerland = "Other",
Greece = "Other",
Israel = "Other",
Lebanon= "Other"))
Upvotes: 1
Views: 511
Reputation: 35
in the end I created this code which works:
#New table with wine count
wineCount <- data %>% count(Country)
#Joining two tables together
dataExtended <- inner_join(wineCount, data, by = "Country")
# Creating new variable CountryOther
dataExtended$CountryOther <- dataExtended$Country
# Renaming count from n to WineCount
dataExtended <- rename(dataExtended, WineCount = n)
# Replacement of countries with WineCount<=20 to Other
dataExtended <- dataExtended %>%
mutate(CountryOther = ifelse(WineCount<=20, "Other", CountryOther))
# Final check
unique(dataExtended$CountryOther)
The problem was I needed to store changes into the dataframe, which I didn't do before (as you can see in my last comment):
dataExtended <- rename(dataExtended, WineCount = n)
and
dataExtended <- dataExtended %>%
mutate(CountryOther = ifelse(WineCount<=20, "Other", CountryOther))
I also tested your code and it works as well and additionally it looks neater. So thank you very much for your help.
Upvotes: 1
Reputation: 16978
Using the Red.csv
from your link imported with readr::read_csv()
creates a data.frame
/ tibble
#> data
# A tibble: 8,666 × 8
Name Country Region Winery Rating NumberOf…¹ Price Year
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr>
1 Pomerol 2011 France Pomerol Château La Providence 4.2 100 95 2011
2 Lirac 2017 France Lirac Château Mont-Redon 4.3 100 15.5 2017
3 Erta e China Rosso di Toscana 2015 Italy Toscana Renzo Masi 3.9 100 7.45 2015
4 Bardolino 2019 Italy Bardolino Cavalchina 3.5 100 8.72 2019
5 Ried Scheibner Pinot Noir 2016 Austria Carnuntum Markowitsch 3.9 100 29.2 2016
6 Gigondas (Nobles Terrasses) 2017 France Gigondas Vieux Clocher 3.7 100 19.9 2017
7 Marion's Vineyard Pinot Noir 2016 New Zealand Wairarapa Schubert 4 100 43.9 2016
8 Red Blend 2014 Chile Itata Valley Viña La Causa 3.9 100 17.5 2014
9 Chianti 2015 Italy Chianti Castello Montaùto 3.6 100 10.8 2015
10 Tradition 2014 France Minervois Domaine des Aires Hautes 3.5 100 6.9 2014
# … with 8,656 more rows, and abbreviated variable name ¹NumberOfRatings
Now with dplyr
s help
library(dplyr)
data %>%
add_count(Country, name = "WineCount") %>%
mutate(CountryOther = ifelse(WineCount <= 20, "Other", Country))
we get
# A tibble: 8,666 × 10
Name Country Region Winery Rating Numbe…¹ Price Year WineC…² Count…³
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int> <chr>
1 Pomerol 2011 France Pomerol Château La… 4.2 100 95 2011 2256 France
2 Lirac 2017 France Lirac Château Mo… 4.3 100 15.5 2017 2256 France
3 Erta e China Rosso di Toscana 2015 Italy Toscana Renzo Masi 3.9 100 7.45 2015 2650 Italy
4 Bardolino 2019 Italy Bardolino Cavalchina 3.5 100 8.72 2019 2650 Italy
5 Ried Scheibner Pinot Noir 2016 Austria Carnuntum Markowitsch 3.9 100 29.2 2016 220 Austria
6 Gigondas (Nobles Terrasses) 2017 France Gigondas Vieux Cloc… 3.7 100 19.9 2017 2256 France
7 Marion's Vineyard Pinot Noir 2016 New Zealand Wairarapa Schubert 4 100 43.9 2016 63 New Ze…
8 Red Blend 2014 Chile Itata Valley Viña La Ca… 3.9 100 17.5 2014 326 Chile
9 Chianti 2015 Italy Chianti Castello M… 3.6 100 10.8 2015 2650 Italy
10 Tradition 2014 France Minervois Domaine de… 3.5 100 6.9 2014 2256 France
# … with 8,656 more rows, and abbreviated variable names ¹NumberOfRatings, ²WineCount, ³CountryOther
We can filter for WineCount <= 30
:
# A tibble: 125 × 10
Name Country Region Winery Rating Numbe…¹ Price Year WineC…² Count…³
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int> <chr>
1 Steiner 2013 Hungary Sopron Wenin… 3.7 100 24.5 2013 9 Other
2 Viile Metamorfosis Merlot 2015 Romania Dealu Mare Vitis… 3.5 102 7.5 2015 23 Romania
3 Halkidiki Limnio - Merlot 2013 Greece Chalkidiki Tsant… 3.2 105 12.5 2013 13 Other
4 Cabernet Sauvignon 2013 Mexico Valle de Guad… L. A.… 3.4 1066 8.65 2013 1 Other
5 Driopi Classic Agiorgitiko Nemea 2017 Greece Nemea Κτημα… 3.7 107 11.5 2017 13 Other
6 Malbec de Purcari 2018 Moldova South Eastern Châte… 4.1 107 12.0 2018 8 Other
7 Cabernet Sauvignon de Purcari 2017 Moldova South Eastern Châte… 4.1 1082 13.0 2017 8 Other
8 Cabernet Sauvignon 2016 Romania Samburesti Caste… 3.3 112 7.9 2016 23 Romania
9 Aigle Les Murailles Rouge 2015 Switzerland Aigle Henri… 3.7 112 23.2 2015 12 Other
10 Γουμένισσα (Goumenissa) 2015 Greece Goumenissa Chatz… 3.7 115 20 2015 13 Other
to check the desired output: There are several rows filled with "Other"
in column CountryOther
.
Upvotes: 1