DaniCee
DaniCee

Reputation: 3207

Substitute specific values in a dataframe by matching strings stored in another dataframe

Say I have a data frame like the following:

mydf=data.frame(id=LETTERS, value=runif(26,0,1), match1=sample(c(0,1),26,replace=T), match2=sample(c(0,2),26,replace=T), match3=sample(c(0,3),26,replace=T), all_matches=sample(0:3,26,replace=T))

which looks like:

> mydf
   id       value match1 match2 match3 all_matches
1   A 0.267675256      1      0      0           0
2   B 0.974518682      1      0      3           3
3   C 0.175529131      1      2      3           0
4   D 0.050552174      0      2      0           0
5   E 0.228286981      0      0      0           1
6   F 0.025520208      0      2      3           1
7   G 0.206697937      1      2      0           2
8   H 0.644523511      0      2      3           2
9   I 0.342110147      0      0      3           3
10  J 0.430250450      1      0      0           1
...

match1 column has 0 and 1 values, match2 has 0 and 2 values, match3 0 and 3, and all_matches values from 0 to 3.

The only thing I want to do here is to rewrite 1, 2, and 3 values in those columns by a name associated to those values, and stored in another data frame:

match_df=data.frame(match=1:3, name=c('ABC','XYZ','IJK'))

which looks like this:

> match_df
  match name
1     1  ABC
2     2  XYZ
3     3  IJK

What would be the best way to replace values 1, 2, 3 in columns match1, match2, match3, all_matches in mydf by names in match_df (leaving value 0 as NA)?

So far I'm merging match_df to each column of interest in mydf in a for loop, but I'm sure this can be done better in one line of code.

Any help appreciated! Thanks!

Upvotes: 1

Views: 89

Answers (2)

Maël
Maël

Reputation: 52049

A one-liner with match:

mydf[-c(1,2)] <- match_df$name[match(unlist(mydf[-c(1,2)]), match_df$match)]

output

#    id      value match1 match2 match3 all_matches
# 1   A 0.17599087    ABC   <NA>   <NA>        <NA>
# 2   B 0.45899500   <NA>    XYZ   <NA>         XYZ
# 3   C 0.12762547    ABC   <NA>   <NA>         XYZ
# 4   D 0.67893265   <NA>    XYZ    IJK         IJK
# 5   E 0.64393827   <NA>   <NA>   <NA>        <NA>
# 6   F 0.93755603   <NA>   <NA>   <NA>         ABC
# 7   G 0.70161939    ABC    XYZ   <NA>        <NA>
# 8   H 0.81897072   <NA>   <NA>    IJK         XYZ
# 9   I 0.26734462   <NA>    XYZ    IJK         ABC
# 10  J 0.03569294   <NA>    XYZ    IJK        <NA>
# 11  K 0.08168074   <NA>   <NA>    IJK         IJK
# 12  L 0.67863032   <NA>   <NA>    IJK         ABC
# 13  M 0.79585738   <NA>    XYZ   <NA>         IJK
# 14  N 0.48506734    ABC    XYZ   <NA>         IJK
# 15  O 0.56177191    ABC   <NA>    IJK        <NA>
# 16  P 0.50113968    ABC    XYZ   <NA>        <NA>
# 17  Q 0.74527715   <NA>   <NA>   <NA>         XYZ
# 18  R 0.64572526   <NA>   <NA>   <NA>        <NA>
# 19  S 0.27640699   <NA>    XYZ    IJK         XYZ
# 20  T 0.76158656   <NA>    XYZ   <NA>         XYZ
# 21  U 0.44533420   <NA>   <NA>    IJK         IJK
# 22  V 0.17232906   <NA>   <NA>    IJK        <NA>
# 23  W 0.87758234    ABC    XYZ   <NA>         ABC
# 24  X 0.15478237   <NA>   <NA>    IJK        <NA>
# 25  Y 0.80055561   <NA>    XYZ    IJK         XYZ
# 26  Z 0.80190420    ABC   <NA>    IJK         ABC

Upvotes: 2

Onyambu
Onyambu

Reputation: 79238

mydf %>%
  mutate(across(contains('match'),~recode(.x,!!!deframe(match_df))))

   id      value match1 match2 match3 all_matches
1   A 0.26767526    ABC   <NA>   <NA>        <NA>
2   B 0.97451868    ABC   <NA>    IJK         IJK
3   C 0.17552913    ABC    XYZ    IJK        <NA>
4   D 0.05055217   <NA>    XYZ   <NA>        <NA>
5   E 0.22828698   <NA>   <NA>   <NA>         ABC
6   F 0.02552021   <NA>    XYZ    IJK         ABC
7   G 0.20669794    ABC    XYZ   <NA>         XYZ
8   H 0.64452351   <NA>    XYZ    IJK         XYZ
9   I 0.34211015   <NA>   <NA>    IJK         IJK
10  J 0.43025045    ABC   <NA>   <NA>         ABC

Upvotes: 4

Related Questions