Reputation: 415
Here is my sample data:
iso3c production reported_by_FAO
<chr> <dbl> <chr>
1 NOR 10740117. Yes
2 TKM 10726360. No
3 SYR 10559247. Yes
4 ISR 10317065. No
5 DOM 10152261. No
6 TJK 9741324. Yes
7 YEM 9554599. No
I want to mutate
a new column that shows the iso3c that is reported by FAO (ie reported_by_FAO == "Yes") which has the nearest production value for each row.
Using the sample data created above, this is what the output would look like.
iso3c production reported_by_FAO nearest_iso3c
<chr> <dbl> <chr> <chr>
1 NOR 10740117. Yes NOR
2 TKM 10726360. No NOR
3 SYR 10559247. Yes SYR
4 ISR 10317065. No SYR
5 DOM 10152261. No SYR
6 TJK 9741324. Yes TJK
7 YEM 9554599. No TJK
Upvotes: 1
Views: 48
Reputation: 886998
If the value of 'Yes' have any indication, then get the cumulative sum based on creating a logical vector with 'Yes' and use that as a grouping variable and create the 'nearest' as the first
value of 'iso3c'
library(dplyr)
df1 %>%
group_by(grp = cumsum(reported_by_FAO == 'Yes')) %>%
mutate(nearest = first(iso3c)) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 7 x 4
# iso3c production reported_by_FAO nearest
# <chr> <dbl> <chr> <chr>
#1 NOR 10740117 Yes NOR
#2 TKM 10726360 No NOR
#3 SYR 10559247 Yes SYR
#4 ISR 10317065 No SYR
#5 DOM 10152261 No SYR
#6 TJK 9741324 Yes TJK
#7 YEM 9554599 No TJK
Based on the comments, we can extract the 'No', 'Yes' elements separately, use findInterval
to get the index and return the corresponding 'iso3c'
df1$nearest <- df1$iso3c
ino <- df1$reported_by_FAO == 'No'
iyes <- df1$reported_by_FAO == 'Yes'
df1$nearest[ino] <- sapply(df1$production[ino], function(x) {
val <- df1$production[iyes]
df1$iso3c[iyes][order(val)][findInterval(x, val[order(val)]) + 1]
})
-output
df1
# iso3c production reported_by_FAO nearest
#1 NOR 10740117 Yes NOR
#2 TKM 10726360 No NOR
#3 SYR 10559247 Yes SYR
#4 ISR 10317065 No SYR
#5 DOM 10152261 No SYR
#6 TJK 9741324 Yes TJK
#7 YEM 9554599 No TJK
Or another option is to get the abs
olute difference and find the index with which.min
df1$nearest[ino] <- sapply(df1$production[ino], function(x) {
val <- df1$production[iyes]
df1$iso3c[iyes][which.min(abs(val -x))]
})
df1 <- structure(list(iso3c = c("NOR", "TKM", "SYR", "ISR", "DOM", "TJK",
"YEM"), production = c(10740117, 10726360, 10559247, 10317065,
10152261, 9741324, 9554599), reported_by_FAO = c("Yes", "No",
"Yes", "No", "No", "Yes", "No")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7"))
Upvotes: 2