user1679161
user1679161

Reputation: 25

Easier solution for searching across columns in data.frame

I have a database like the following

Var1 Var2 Var3 X Y Z
VIX  SPX  VOL  2 3 4
SPX  VIX  NA   4 4 NA
SPX  NA   NA   2 NA NA

I want to isolate the first row with inputs from user as VIX, SPX and VOL. However, in a database like this I do not know in which combination these variables are kept in.

That is we might have VIX as Var1, SPX as Var2 and VOL as Var3 or we might have VOL as Var1, SPX as Var2 and VIX as Var3. In such a case 6 combinations are possible.

I can concatenate strings to create all 6 possibilities in R and do a row look up. But I am looking for an easier algorithm.

I am trying to use subset in to filter one column after another but that is equally cumbersome. Is there an easier way out in R

Upvotes: 2

Views: 94

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388982

We can use apply row-wise for the first three columns and check with grepl if it has any of the three possible values and finally using all we verify that all the three columns for that row has the accepted values.

df[apply(df[1:3], 1, function(x) all(grepl("VIX|SPX|VOL", x))), ]

#   Var1 Var2 Var3 X Y Z
#1  VIX  SPX  VOL  2 3 4

Or as @Cath mentioned in the comments using the above can result into selecting some unnecessary rows. We can change it to

df[apply(df[1:3], 1, function(x) all(c("VIX", "SPX", "VOL") %in% x)), ]

to ensure that all three unique values are present in the three columns.

Upvotes: 5

Cath
Cath

Reputation: 24074

You can also try with data.table package:

library(data.table)
setDT(df)[df[, all(c("VIX", "SPX", "VOL") %in% c(Var1, Var2, Var3)), by=1:3]$V1]
#   Var1 Var2 Var3 X Y Z
#1:  VIX  SPX  VOL 2 3 4

The condition all(c("VIX", "SPX", "VOL") %in% c(Var1, Var2, Var3) permits to ensure that all 3 values are present among the 3 variables (avoiding to select a row with, for example, VIX VIX VOL, in case that may happen).

Upvotes: 2

akrun
akrun

Reputation: 887118

We can use vectorized rowSums

df1[rowSums(Reduce(`|`, lapply(c("VIX", "SPX", "VOL"), `==`, df1[1:3])), na.rm = TRUE)==3,]
#  Var1 Var2 Var3 X Y Z
#1  VIX  SPX  VOL 2 3 4

Or another vectorized method instead of looping through rows.

df1[!rowSums(Vectorize(function(x) !x%in% c("VIX", "SPX", "VOL"))(df1[1:3])),]
#  Var1 Var2 Var3 X Y Z
#1  VIX  SPX  VOL 2 3 4

Upvotes: 0

Related Questions