Reputation: 97
I have a df that looks like:
# Groups: telefono1 [360]
nombre telefono1 telefono2 ads_ao ads_am ads_mas ads_ma total_price_ao total_price_am total_price_mas~ total_price_ma
<chr> <chr> <lgl> <chr> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 CRIST~ 60***0400 NA 0 0 0 0 0 0 0 0
2 CRIST~ 60***0400 NA 0 61 0 0 0 0 0 0
3 CRIST~ 60***0400 NA 0 0 0 0 0 0 0 0
4 DIEGO~ 60***0844 NA 0 0 0 0 0 0 0 0
5 Diego 60***0844 FALSE 0 0 0 2 0 0 0 22500
6 TALLE~ 60***4848 NA 2 0 0 0 59000 0 0 0
7 TALLE~ 60***4848 NA 0 311 0 0 0 0 0 0
8 Rub?n 60***4848 FALSE 0 0 0 2 0 0 0 59000
I need to group by telefono1 and then keep the entire row which value across columns "ads_ao, ads,am, ads_mas and ads_ma" was the maximum value.
In the example, the right solution were rows #2, #5 and #7 because they are the ones, groupping by telefono1, that have the maximum value across mentioned columns.
I tried it with dplyr
group_by(telefono1) %>%
mutate(max_value = max(ads_ao, ads_am, ads_mas, ads_ma)) %>%
ungroup()
But with this method I get a new variable with max value in all rows and then I can't use a top_n() to get the row with the max value because all are equal.
Please, anyone know any solution? Thanks!!
Upvotes: 2
Views: 99
Reputation: 887118
If it is to filter
the row with the max values across
columns that starts with 'ads', create the 'max_value' column(it is just to show the steps more clearly) with pmax
(vectorized
option to get the max
per row), then grouped by 'telefono1', use slice_max
on the 'max_value' column to return only that row where the 'max_value' is max, ungroup
and remove the 'max_value' column
library(dplyr)
library(purrr)
df1 %>%
mutate(max_value = invoke(pmax, select(., starts_with('ads')))) %>%
group_by(telefono1) %>%
slice_max(max_value) %>%
# or use
# slice(which.max(max_value)) %>%
ungroup %>%
select(-max_value)
# A tibble: 3 x 11
# nombre telefono1 telefono2 ads_ao ads_am ads_mas ads_ma total_price_ao total_price_am total_price_mas. total_price_ma
# <chr> <int> <lgl> <int> <int> <int> <int> <int> <int> <int> <int>
#1 CRIST~ 600040400 NA 0 61 0 0 0 0 0 0
#2 Diego 600530844 FALSE 0 0 0 2 0 0 0 22500
#3 TALLE~ 600674848 NA 0 311 0 0 0 0 0 0
If the dataset is already grouped, ungroup
it first before creating the max_value
column
df1 %>%
ungroup %>%
mutate(max_value = invoke(pmax, select(., starts_with('ads')))) %>%
group_by(telefono1) %>%
slice(which.max(max_value))
Or using base R
mx <- do.call(pmax, df1[startsWith(names(df1), 'ads')])
subset(df1, mx == ave(mx, telefono1, FUN = max))
# nombre telefono1 telefono2 ads_ao ads_am ads_mas ads_ma total_price_ao total_price_am total_price_mas. total_price_ma
#2 CRIST~ 600040400 NA 0 61 0 0 0 0 0 0
#5 Diego 600530844 FALSE 0 0 0 2 0 0 0 22500
#7 TALLE~ 600674848 NA 0 311 0 0 0 0 0 0
df1 <- structure(list(nombre = c("CRIST~", "CRIST~", "CRIST~", "DIEGO~",
"Diego", "TALLE~", "TALLE~", "Rub?n"), telefono1 = c(600040400L,
600040400L, 600040400L, 600530844L, 600530844L, 600674848L, 600674848L,
600674848L), telefono2 = c(NA, NA, NA, NA, FALSE, NA, NA, FALSE
), ads_ao = c(0L, 0L, 0L, 0L, 0L, 2L, 0L, 0L), ads_am = c(0L,
61L, 0L, 0L, 0L, 0L, 311L, 0L), ads_mas = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), ads_ma = c(0L, 0L, 0L, 0L, 2L, 0L, 0L, 2L), total_price_ao = c(0L,
0L, 0L, 0L, 0L, 59000L, 0L, 0L), total_price_am = c(0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L), total_price_mas. = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), total_price_ma = c(0L, 0L, 0L, 0L, 22500L, 0L, 0L,
59000L)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8"))
Upvotes: 1
Reputation: 39858
One dplyr
and purrr
option could be:
df %>%
group_by(telefono1) %>%
slice_max(pmap_dbl(across(starts_with("ads")), max), 1)
nombre telefono1 telefono2 ads_ao ads_am ads_mas ads_ma total_price_ao total_price_am total_price_mas. total_price_ma
<chr> <int> <lgl> <int> <int> <int> <int> <int> <int> <int> <int>
1 CRIST~ 600040400 NA 0 61 0 0 0 0 0 0
2 Diego 600530844 FALSE 0 0 0 2 0 0 0 22500
3 TALLE~ 600674848 NA 0 311 0 0 0 0 0 0
Upvotes: 1