Reputation: 1335
I have the following dataframe:
df <- structure(list(country = c("Ghana", "Eritrea", "Ethiopia", "Ethiopia",
"Congo - Kinshasa", "Ethiopia", "Ethiopia", "Ghana", "Botswana",
"Nigeria"), CommodRank = c(1L, 2L, 3L, 1L, 3L, 1L, 1L, 1L, 1L,
1L), topCommodInCountry = c(TRUE, FALSE, FALSE, TRUE, FALSE,
TRUE, TRUE, TRUE, TRUE, TRUE), Main_Commod = c("Gold", "Copper",
"Nickel", "Gold", "Gold", "Gold", "Gold", "Gold", "Diamonds",
"Iron Ore")), row.names = c(NA, -10L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), vars = "country", drop = TRUE, indices = list(
8L, 4L, 1L, c(2L, 3L, 5L, 6L), c(0L, 7L), 9L), group_sizes = c(1L,
1L, 1L, 4L, 2L, 1L), biggest_group_size = 4L, labels = structure(list(
country = c("Botswana", "Congo - Kinshasa", "Eritrea", "Ethiopia",
"Ghana", "Nigeria")), row.names = c(NA, -6L), class = "data.frame", vars = "country", drop = TRUE, .Names = "country"), .Names = c("country",
"CommodRank", "topCommodInCountry", "Main_Commod"))
df
country CommodRank topCommodInCountry Main_Commod
1 Ghana 1 TRUE Gold
2 Eritrea 2 FALSE Copper
3 Ethiopia 3 FALSE Nickel
4 Ethiopia 1 TRUE Gold
5 Congo - Kinshasa 3 FALSE Gold
6 Ethiopia 1 TRUE Gold
7 Ethiopia 1 TRUE Gold
8 Ghana 1 TRUE Gold
9 Botswana 1 TRUE Diamonds
10 Nigeria 1 TRUE Iron Ore
I am trying to add another column showing the top commodity (top CommodRank) for every country in this dataset, but I'm not sure how. I'm able to label 'topcommod' with the 'Main_Commod' where CommodRank == 1, but I want to copy this same value to cases where CommodRank != 1. Looking below, both Ethiopia values at rows 3 & 4 should read 'Gold'.
df %>% mutate(topcommod = ifelse(CommodRank == 1, Main_Commod, 'unknown'))
country CommodRank topCommodInCountry Main_Commod topcommod
1 Ghana 1 TRUE Gold Gold
2 Eritrea 2 FALSE Copper unknown
3 Ethiopia 3 FALSE Nickel unknown
4 Ethiopia 1 TRUE Gold Gold
5 Congo - Kinshasa 3 FALSE Gold unknown
6 Ethiopia 1 TRUE Gold Gold
7 Ethiopia 1 TRUE Gold Gold
8 Ghana 1 TRUE Gold Gold
9 Botswana 1 TRUE Diamonds Diamonds
10 Nigeria 1 TRUE Iron Ore Iron Ore
I'm ideally looking for a dplyr solution I can add to an existing long series of pipe %>% function calls, but any solution would help.
Upvotes: 4
Views: 114
Reputation: 70336
IIUC, there are multiple ways to do this, for example:
df %>% mutate(topCom = if(!any(topCommodInCountry)) "unknown"
else Main_Commod[which.max(topCommodInCountry)])
# A tibble: 10 x 5
# Groups: country [6]
country CommodRank topCommodInCountry Main_Commod topCom
<chr> <int> <lgl> <chr> <chr>
1 Ghana 1 TRUE Gold Gold
2 Eritrea 2 FALSE Copper unknown
3 Ethiopia 3 FALSE Nickel Gold
4 Ethiopia 1 TRUE Gold Gold
5 Congo - Kinshasa 3 FALSE Gold unknown
6 Ethiopia 1 TRUE Gold Gold
7 Ethiopia 1 TRUE Gold Gold
8 Ghana 1 TRUE Gold Gold
9 Botswana 1 TRUE Diamonds Diamonds
10 Nigeria 1 TRUE Iron Ore Iron Ore
Regarding OP's question in comment how to handle ties of multiple top Commodities, you could do the following:
df %>%
mutate(topCom = if(!any(topCommodInCountry)) "unknown"
else paste(unique(Main_Commod[topCommodInCountry]), collapse = "/"))
If there are multiple unique top Commodities in a country, they will be paste together into a single string, separated by /
.
Upvotes: 5
Reputation: 4873
It's not an answer but learning greatly from @docendo discimus answer, it took me a second to understand the "if negative" (!any(topCommodInCountry)
), and I was wondering if it's only me or it would take my computer a second more to do that too :)
Using the same dataset I examined the idea of making the if else
positive. First I tested for identical
between the two solutions:
identical(
#Negative
df %>%
mutate(topCom = if(!any(topCommodInCountry)) "unknown"
else Main_Commod[which.max(topCommodInCountry)]),
#Positive
df %>%
mutate(topCom = if(any(topCommodInCountry)) Main_Commod[which.max(topCommodInCountry)]
else "unknown"))
[1] TRUE
Next, I tested the benchmark of the two:
require(rbenchmark)
benchmark("Negative" = {
df %>%
mutate(topCom = if(!any(topCommodInCountry)) "unknown"
else Main_Commod[which.max(topCommodInCountry)])
},
"Positive" = {
df %>%
mutate(topCom = if(any(topCommodInCountry)) Main_Commod[which.max(topCommodInCountry)]
else "unknown")
},
replications = 10000,
columns = c("test", "replications", "elapsed",
"relative", "user.self", "sys.self"))
The difference is not that big but I'm assuming that with a bigger dataset it will increase.
test replications elapsed relative user.self sys.self
1 Negative 10000 12.59 1.015 12.44 0
2 Positive 10000 12.41 1.000 12.30 0
Upvotes: 0
Reputation: 10671
another pattern with dplyr
...
df %>% arrange(CommodRank) %>%
mutate(topCommod = Main_Commod[1])
Upvotes: 0