Reputation: 262
I have a huge list of company names. As illustrated below, if name company is ABBEYCREST.DEAD...10.10.14...ASK.PRICE, this means ABBEYCREST.DEAD...10.10.14... is name of company and ASK.PRICE is ASK Price data and when it ends with BID.PRICE is means its the BID PRICE data. I want to identify the company whose only one column name is avaiable in the dataframe. Actually I have a dataframe which has colum headers as illustrated below, implying each company should have 2 columns, if there are 4000 companies so there should be 8000 columns in my dataframe but I have 7999 ( although my dataframe has a date column but I exclude it when I count columns).
df<-AskBid
ABBEYCREST.DEAD...10.10.14...ASK.PRICE
ABBEYCREST.DEAD...10.10.14...BID.PRICE
ABBOT.GROUP.DEAD...07.03.08...ASK.PRICE
ABBOT.GROUP.DEAD...07.03.08...BID.PRICE
ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...ASK.PRICE
ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...BID.PRICE
ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...ASK.PRICE
ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...BID.PRICE
ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...ASK.PRICE
ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...BID.PRICE
ABERTIS..IRS....BID.PRICE
ABGENIX..IRS..DEAD...12.11.07...ASK.PRICE
ABGENIX..IRS..DEAD...12.11.07...BID.PRICE
ABLON.GROUP.DEAD...31.05.13...ASK.PRICE
ABLON.GROUP.DEAD...31.05.13...BID.PRICE
ACAMBIS.DEAD...25.09.08...ASK.PRICE
ACAMBIS.DEAD...25.09.08...BID.PRICE
I want to find is
missing <- df
ABERTIS..IRS....BID.PRICE
I would really appreciate your help. This is causing problems in my estimations.
Upvotes: 0
Views: 146
Reputation: 10473
Here is another solution assuming text is the column name in the data frame read in:
library(dplyr)
df$text <- gsub(("(ASK|BID)", "", df$text)
df %>% group_by(text) %>% filter(n() != 2)
Upvotes: 0
Reputation: 25726
You can remove the ASK.PRICE
and BID.PRICE
part and call duplicated
twice (the second time on the reversed order):
cn <- readLines(textConnection(
"ABBEYCREST.DEAD...10.10.14...ASK.PRICE
ABBEYCREST.DEAD...10.10.14...BID.PRICE
ABBOT.GROUP.DEAD...07.03.08...ASK.PRICE
ABBOT.GROUP.DEAD...07.03.08...BID.PRICE
ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...ASK.PRICE
ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...BID.PRICE
ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...ASK.PRICE
ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...BID.PRICE
ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...ASK.PRICE
ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...BID.PRICE
ABERTIS..IRS....BID.PRICE
ABGENIX..IRS..DEAD...12.11.07...ASK.PRICE
ABGENIX..IRS..DEAD...12.11.07...BID.PRICE
ABLON.GROUP.DEAD...31.05.13...ASK.PRICE
ABLON.GROUP.DEAD...31.05.13...BID.PRICE
ACAMBIS.DEAD...25.09.08...ASK.PRICE
ACAMBIS.DEAD...25.09.08...BID.PRICE"))
## remove (ASK|BID).PRICE
cn.sub <- gsub("(ASK|BID)\\.PRICE$", "", cn)
cn[!(duplicated(cn.sub) | rev(duplicated(rev(cn.sub))))]
# [1] "ABERTIS..IRS....BID.PRICE"
Upvotes: 2