Reputation: 13
I have two tibbles and need to index the data in one tibble and insert some specific data in the other tibble based on a variable in the first tibble.
I have two tibbles:
library(dplyr)
# Set seed
set.seed(10)
# Get data
df1 <- starwars %>%
select(name,species) %>%
filter(species %in% c("Human","Wookiee","Droid")) %>%
mutate(Fav_colour = sample(c("blue","red","green"),n(),replace=TRUE))
# Random table with typical colour preference
df2 <- tibble(Colour = c("blue","green","red"),
Human = c(0.5,0.3,0.1),
Wookiee = c(0.2,0.8,0.5),
Droid = c(0.3,0.1,0.5))
In df1 I need to insert the typical colour preference based on the species. To do this I can iterate through each row of the tibble in a for loop, add the relevant data, then compile into a list.
# Make empty list
data <- list()
# Iterate through each row
for (x in 1:nrow(df1)) {
# Take a slice
tmp <- slice(df1, x)
# Add new column to slice based on data in slice (species)
tmp$Typical <- df2 %>%
select(Colour,tmp$species) %>%
arrange(desc(.data[[tmp$species]])) %>%
slice_head(n = 1) %>%
select(Colour) %>%
as.character()
#Add data to list
data[[x]] <- tmp
}
#Recompile df1
df1 <- list.rbind(data)
I think there must be a more efficient way to do this, but can't figure out how to obtain filtered and arranged values from df2 without putting it through a for loop. I don't know how to do this, but is sapply with a function perhaps better option? What is the dplyr way of doing this without a for loop?
Upvotes: 1
Views: 57
Reputation: 3447
Please check the alternate approach without the use of loop, check the df4 dataframe
df3 <- df2 %>%
pivot_longer(c('Human','Wookiee','Droid'),names_to = 'species', values_to = 'score') %>%
arrange(species, desc(score)) %>%
group_by(species) %>% slice(1)
df4 <- df1 %>% left_join(df3, by='species') %>% rename(Typical = Colour) %>% select(-score)
Upvotes: 0
Reputation: 66765
It sounds like you want from df2
the largest value per species. If we pivot_longer
to make the species be specified in one column, and the value in another, we can group by species and keep the largest value. This lookup table (with colour + value per species) can be joined to the original data.
df1 %>%
left_join(
df2 %>%
tidyr::pivot_longer(2:4, names_to = "species") %>%
group_by(species) %>%
slice_max(value)
)
Result
Joining with `by = join_by(species)`
# A tibble: 43 × 5
name species Fav_colour Colour value
<chr> <chr> <chr> <chr> <dbl>
1 Luke Skywalker Human green blue 0.5
2 C-3PO Droid blue red 0.5
3 R2-D2 Droid red red 0.5
4 Darth Vader Human green blue 0.5
5 Leia Organa Human red blue 0.5
6 Owen Lars Human green blue 0.5
7 Beru Whitesun lars Human green blue 0.5
8 R5-D4 Droid green red 0.5
9 Biggs Darklighter Human green blue 0.5
10 Obi-Wan Kenobi Human green blue 0.5
# … with 33 more rows
# ℹ Use `print(n = ...)` to see more rows
Upvotes: 2