print rows where substring of two column matches

Question

I want to grep the lines were the string before . in column V1 and V2 is similar for the same row. For instance in the examples below row 1 would be such a case.

I guess I need to use gsub somehow combined with == gsub( ".*$", "", out )

> head(out)
                         V1                        V2                V3   V4
1  hsa-miR-99b-5p.dataSerum hsa-miR-99b-5p.dataTissue 0.261887741880618 
2 hsa-miR-99b-3p.dataTissue hsa-miR-99b-5p.dataTissue 0.979410208303266 
3 hsa-miR-99b-3p.dataTissue  hsa-miR-99b-5p.dataSerum 0.266705152258623 
4  hsa-miR-99b-3p.dataSerum hsa-miR-99b-5p.dataTissue 0.227329471105902 
5  hsa-miR-99b-3p.dataSerum  hsa-miR-99b-5p.dataSerum 0.944112218530823 
6  hsa-miR-99b-3p.dataSerum hsa-miR-99b-3p.dataTissue  0.20025336348038

akrun · Accepted Answer

We can try sub. Match the pattern dot (\.) followed by zero or more characters (.*) and replace it with '' for columns 'V1' and 'V2', then use == to get the logical index and subset the rows.

v1 <- sub('\..*', '', out$V1)
v2 <- sub('\..*', '', out$V2)

out[v1==v2,]
#                      V1                        V2        V3   V4
#1 hsa-miR-99b-5p.dataSerum hsa-miR-99b-5p.dataTissue 0.2618877 
#6 hsa-miR-99b-3p.dataSerum hsa-miR-99b-3p.dataTissue 0.2002534

print rows where substring of two column matches

Answers (1)

Related Questions