Reputation: 25
I have a database like the following
Var1 Var2 Var3 X Y Z
VIX SPX VOL 2 3 4
SPX VIX NA 4 4 NA
SPX NA NA 2 NA NA
I want to isolate the first row with inputs from user as VIX, SPX and VOL. However, in a database like this I do not know in which combination these variables are kept in.
That is we might have VIX as Var1, SPX as Var2 and VOL as Var3 or we might have VOL as Var1, SPX as Var2 and VIX as Var3. In such a case 6 combinations are possible.
I can concatenate strings to create all 6 possibilities in R and do a row look up. But I am looking for an easier algorithm.
I am trying to use subset in to filter one column after another but that is equally cumbersome. Is there an easier way out in R
Upvotes: 2
Views: 94
Reputation: 388982
We can use apply
row-wise for the first three columns and check with grepl
if it has any of the three possible values and finally using all
we verify that all the three columns for that row has the accepted values.
df[apply(df[1:3], 1, function(x) all(grepl("VIX|SPX|VOL", x))), ]
# Var1 Var2 Var3 X Y Z
#1 VIX SPX VOL 2 3 4
Or as @Cath mentioned in the comments using the above can result into selecting some unnecessary rows. We can change it to
df[apply(df[1:3], 1, function(x) all(c("VIX", "SPX", "VOL") %in% x)), ]
to ensure that all three unique values are present in the three columns.
Upvotes: 5
Reputation: 24074
You can also try with data.table
package:
library(data.table)
setDT(df)[df[, all(c("VIX", "SPX", "VOL") %in% c(Var1, Var2, Var3)), by=1:3]$V1]
# Var1 Var2 Var3 X Y Z
#1: VIX SPX VOL 2 3 4
The condition all(c("VIX", "SPX", "VOL") %in% c(Var1, Var2, Var3)
permits to ensure that all 3 values are present among the 3 variables (avoiding to select a row with, for example, VIX VIX VOL
, in case that may happen).
Upvotes: 2
Reputation: 887118
We can use vectorized rowSums
df1[rowSums(Reduce(`|`, lapply(c("VIX", "SPX", "VOL"), `==`, df1[1:3])), na.rm = TRUE)==3,]
# Var1 Var2 Var3 X Y Z
#1 VIX SPX VOL 2 3 4
Or another vectorized method instead of looping through rows.
df1[!rowSums(Vectorize(function(x) !x%in% c("VIX", "SPX", "VOL"))(df1[1:3])),]
# Var1 Var2 Var3 X Y Z
#1 VIX SPX VOL 2 3 4
Upvotes: 0