Reputation: 73
I have two large data frames df1 and df2. I just want to keep only those columns with df2 which match the first part of column names ( which represent companies names) of df1. That means I want to subset df2 by matching companies name from df1. The files are like this :
df1
date Com1 - PI Com3 - PI Com6 - PI
1/1/2007 292782.4 101780.7 2406010
1/2/2007 292782.4 101780.7 2406010
1/3/2007 292782.4 101780.7 2406010
df2
date Com1 - CV Com2- CV Com3 - CV Com4 - CV Com5- CV Com6 - CV
6/6/1990 1.57 NA 5.3 NA NA 4.51
6/7/1990 1.57 NA 5.2 NA NA 4.51
6/8/1990 1.59 NA 5.4 NA NA 4.51
6/11/1990 1.59 NA 5.2 NA NA 4.51
Expected output (which is a subset of df2)
date Com1 - CV Com3 - CV Com6 - CV
6/6/1990 1.57 5.3 4.51
6/7/1990 1.57 5.2 4.51
6/8/1990 1.59 5.4 4.51
6/11/1990 1.59 5.2 4.51
As a new learner of r , I would be grateful if you can help me in this regard.
Upvotes: 1
Views: 71
Reputation: 887891
Remove the part after the first word with regex and then use %in%
to get an exact match
df2[sub("\\s*-\\s*.*", "", names(df2)) %in% sub("\\s*-\\s*.*", "", names(df1))]
# date Com1 - CV Com3 - CV Com6 - CV
#1 6/6/1990 1.57 5.3 4.51
#2 6/7/1990 1.57 5.2 4.51
#3 6/8/1990 1.59 5.4 4.51
#4 6/11/1990 1.59 5.2 4.51
Upvotes: 1