subset values in a df based on values in another df

Question

I have two dfs, both dim [1] 54 210. One (lets call it dfx) contains 1, 0s to mark incorrect and correct answers on a test. dfy contains the response time for each of these questions. I'd like to subset(merge() (perhaps) all items from dfy that are == 1 in dfx. The data is in the wide format, ID = rownames and columns represent each question.

Example:

dfx

Q1 Q2 Q3 Q4 Q5 …
1  1  1  1  1
1  1  1  1  1
1  1  0  1  1 
1  1  0  1  1

Dfy

Q1_3 Q2_3  Q3_3  Q4_3  Q5_3 ...
16.01 8.23 18.13 11.14 18.03
17.25 7.50 11.72 10.84  7.24

I would need a dfz that is a subset of dfy, in which if dfx[Q1] == 1, dfy [Q1_3] is returned as dfz[Q1_3], otherwise NA or dfx[Q1]( which is 0).

I can do it if I specify cols by

dfz<- cbind(ifelse(dfx$Q1 == 1, dfy$Q1_3, dfx$Q1))

however I don't know how to apply it for the whole df.

Any ideas?

Barranka · Accepted Answer

If both data frames have the same size, and dfx has only ones and zeros, you can multiply them to get what you need:

dfz <- dfy * dfx

On your next comment, you ask how can you manipulate columns from a dataframe based on the values of other data frame. I frequently use the sqldf package for this kind of thing. It let's you manipulate dataframes using SQL instructions. You'll need some id column that let's you relate your dataframes.

A simple example:

library(sqldf)
sqldf("select df_a.id
            , case
                  when df_b.q1 = 1 then df_a.q1
                  else 0
              end as value
       from df_a
            inner join df_b on df_a.id = df_b.id")

As you can see, you can join dataframes as if they were tables in a database.

Hope this helps.

subset values in a df based on values in another df

Answers (1)

Related Questions