R dataframe join by column name

Question

I have a rather unique issue (I believe) when I am trying to join 2 dataframe where the join criteria is on the column name (not value). Let me explain by an example with data: Here is the head of my prediction data frame (multiclass predictions):

> head(mnm.predict.test.probs)
              1            2          3
9  1.013755e-04 3.713862e-02 0.96276001
10 1.904435e-11 3.153587e-02 0.96846413
12 6.445101e-23 1.119782e-11 1.00000000
13 1.238355e-04 2.882145e-02 0.97105472
22 9.027254e-01 7.259787e-07 0.09727389
26 1.365667e-01 4.034372e-01 0.45999610

and here is the head of the response dataframe:

> head(testing.logist$cut.rank)
[1] 3 3 3 3 1 3

The join between these 2 sets should look up the probability in the first dataframe by the corresponding value from the second. For instance: The returned dataframe/list should look like:

0.96276001
0.96846413
1.00000000
0.97105472
9.027254e-01
0.45999610

Any idea how to do that efficiently ?

nicola · Accepted Answer

The [ subset operator accepts also a matrix as argument in which each row represents the row and column indices of the element you want to get. Try this:

mnm.predict.test.probs[cbind(1:nrow(mnm.predict.test.probs),testing.logist$cut.‌‌rank)]
#[1] 0.9627600 0.9684641 1.0000000 0.9710547 0.9027254 0.4599961

Being an internal operator, this is way faster than any for loop or a *apply based solution.

R dataframe join by column name

Answers (1)

Related Questions