Reputation: 252
I am trying to determine sequence similarity.
I would like to create a function to compare df
elements, for the following example:
V1 V2 V3 V4
1 C D A D
2 A A S E
3 V T T V
4 A T S S
5 C D R Y
6 C A D V
7 V T E T
8 A T A A
9 R V V W
10 W R D D
I want to compare the first element from the first column with a first element from the second column. If it matches == 1, else 0. Then the second element from the first column compared with the second element from the second column. and so on.
For example:
C != D -----0
A == A -----1
That way I would like to compare column 1 with column 2 then column 3 and column 4. Then column 2 compare with column 3 and column 4. Then column 3 with column 4.
The output would be just the numbers:
0
1
0
0
0
0
0
0
0
0
I tried the following but it doesn't work:
compared_df <- ifelse(df_trial$V1==df_trial$V2,1,ifelse(df_trial$V1==df_trial$V2,0,NA))
compared_df
As suggested, I tried the following:
compared_df1 <- df_trial$matches <- as.integer(df_trial$V1 == df_trial$V2)
This works well for small sample comparison. Is there a way to compare more globally? Like for the updated columns.
Upvotes: 1
Views: 474
Reputation: 8298
As @Ronak Shah said in the comment using the following is sufficent in the case you want to compare 2 values:
df$matches <- as.integer(df$V1 == df$V2)
Another option which is applicable to more the 2 rows as well is to use apply
to check for the number of unique elements in a row in the following way:
df$matches = apply(df, 1, function(x) as.integer(length(unique(x)) == 1))
Upvotes: 1