user2300940
user2300940

Reputation: 2385

print rows where substring of two column matches

I want to grep the lines were the string before . in column V1 and V2 is similar for the same row. For instance in the examples below row 1 would be such a case.

I guess I need to use gsub somehow combined with == gsub( ".*$", "", out )

> head(out)
                         V1                        V2                V3   V4
1  hsa-miR-99b-5p.dataSerum hsa-miR-99b-5p.dataTissue 0.261887741880618 <NA>
2 hsa-miR-99b-3p.dataTissue hsa-miR-99b-5p.dataTissue 0.979410208303266 <NA>
3 hsa-miR-99b-3p.dataTissue  hsa-miR-99b-5p.dataSerum 0.266705152258623 <NA>
4  hsa-miR-99b-3p.dataSerum hsa-miR-99b-5p.dataTissue 0.227329471105902 <NA>
5  hsa-miR-99b-3p.dataSerum  hsa-miR-99b-5p.dataSerum 0.944112218530823 <NA>
6  hsa-miR-99b-3p.dataSerum hsa-miR-99b-3p.dataTissue  0.20025336348038 <NA>

Upvotes: 1

Views: 38

Answers (1)

akrun
akrun

Reputation: 887048

We can try sub. Match the pattern dot (\\.) followed by zero or more characters (.*) and replace it with '' for columns 'V1' and 'V2', then use == to get the logical index and subset the rows.

v1 <- sub('\\..*', '', out$V1)
v2 <- sub('\\..*', '', out$V2)

out[v1==v2,]
#                      V1                        V2        V3   V4
#1 hsa-miR-99b-5p.dataSerum hsa-miR-99b-5p.dataTissue 0.2618877 <NA>
#6 hsa-miR-99b-3p.dataSerum hsa-miR-99b-3p.dataTissue 0.2002534 <NA>

Upvotes: 1

Related Questions