Dr. Richard Tennen
Dr. Richard Tennen

Reputation: 267

Update observations in dataframe according to other dataframe?

I have a dataframe say prod_score:

product score
a       1
d       2
ff      2
e       3
fvf     1

I have another dataframe prod_rank with same products + their rank prod_rank:

product   rank
a         11
d         4
ff        1
e         5
fvf       9

Just to clarify I have a lot of observations that's why I show sample data.

Filtering all the products with score 2:

library(dplyr)
prod_scr_2 <- prod_score %>% filter(score == 2)

Now I want to take prod_scr_2 products and update the score according to the prod_rank df:

I have used join:

decision_tbl <- inner_join(prod_scr_2, prod_rank, by = "product") %>%
                                                top_n(2,desc(rank))

Now I am taking decision_tbl$product and want to update the score of only the products that get top rank.

I have used match to do this:

prods2update_idx <- match(decision_tbl$product, prod_score$product)

Given match indexes I am trying to update the prod_score dataframe, please advise how can I do this?

Upvotes: 0

Views: 28

Answers (1)

AntoniosK
AntoniosK

Reputation: 16121

Assume that the score of interest is 2 (as you mentioned in your example) and the updated score for the products with top rank is 100. Those can be changed.

Here's a dplyr solution, because I saw you started working using this package:

library(dplyr)

prod_score = read.table(text = "
product score
a       1
d       2
ff      2
e       3
fvf     1
", header = T, stringsAsFactors = F)

prod_rank = read.table(text = "
product   rank
a         11
d         4
ff        1
e         5
fvf       9
", header = T, stringsAsFactors = F)


prod_score %>% 
  filter(score == 2) %>%                                 # select products with score = 2
  inner_join(prod_rank, by = "product") %>%              # join to get ranks
  filter(rank == max(rank)) %>%                          # keep product(s) with maximum ranks
  rename(given_score = score) %>%                        # change column name (for the next join)
  right_join(prod_score, by = "product") %>%             # join to get scores
  mutate(score = ifelse(!is.na(rank), 100, score)) %>%   # update score when there's a rank value
  select(-given_score, -rank)                            # remove unnecessary columns

#   product score
# 1       a     1
# 2       d   100
# 3      ff     2
# 4       e     3
# 5     fvf     1

And an alternative approach in base R. Remember to re-build initial example datasets:

# get products with score = 2
prod_score$product[prod_score$score == 2] -> prds_score_2

# get ranks for those products
prod_rank[prod_rank$product %in% prds_score_2,] -> prds_score_2_ranks

# keep products with maximum rank to update
prds_score_2_ranks$product[prds_score_2_ranks$rank == max(prds_score_2_ranks$rank)] -> prds_to_update

# update values for those products in your initial table
prod_score$score[prod_score$product %in% prds_to_update] = 100

# see the updates
prod_score

#   product score
# 1       a     1
# 2       d   100
# 3      ff     2
# 4       e     3
# 5     fvf     1

Upvotes: 1

Related Questions