Aquarius
Aquarius

Reputation: 262

How to find unique name/character from a list of names in R

I have a huge list of company names. As illustrated below, if name company is ABBEYCREST.DEAD...10.10.14...ASK.PRICE, this means ABBEYCREST.DEAD...10.10.14... is name of company and ASK.PRICE is ASK Price data and when it ends with BID.PRICE is means its the BID PRICE data. I want to identify the company whose only one column name is avaiable in the dataframe. Actually I have a dataframe which has colum headers as illustrated below, implying each company should have 2 columns, if there are 4000 companies so there should be 8000 columns in my dataframe but I have 7999 ( although my dataframe has a date column but I exclude it when I count columns).

df<-AskBid

    ABBEYCREST.DEAD...10.10.14...ASK.PRICE
    ABBEYCREST.DEAD...10.10.14...BID.PRICE
    ABBOT.GROUP.DEAD...07.03.08...ASK.PRICE
    ABBOT.GROUP.DEAD...07.03.08...BID.PRICE
    ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...ASK.PRICE
    ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...BID.PRICE
    ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...ASK.PRICE
    ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...BID.PRICE
    ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...ASK.PRICE
    ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...BID.PRICE
    ABERTIS..IRS....BID.PRICE
    ABGENIX..IRS..DEAD...12.11.07...ASK.PRICE
    ABGENIX..IRS..DEAD...12.11.07...BID.PRICE
    ABLON.GROUP.DEAD...31.05.13...ASK.PRICE
    ABLON.GROUP.DEAD...31.05.13...BID.PRICE
    ACAMBIS.DEAD...25.09.08...ASK.PRICE
    ACAMBIS.DEAD...25.09.08...BID.PRICE

I want to find is

missing <- df
ABERTIS..IRS....BID.PRICE

I would really appreciate your help. This is causing problems in my estimations.

Upvotes: 0

Views: 146

Answers (2)

Gopala
Gopala

Reputation: 10473

Here is another solution assuming text is the column name in the data frame read in:

library(dplyr)
df$text <- gsub(("(ASK|BID)", "", df$text)
df %>% group_by(text) %>% filter(n() != 2)

Upvotes: 0

sgibb
sgibb

Reputation: 25726

You can remove the ASK.PRICE and BID.PRICE part and call duplicated twice (the second time on the reversed order):

cn <- readLines(textConnection(
"ABBEYCREST.DEAD...10.10.14...ASK.PRICE
ABBEYCREST.DEAD...10.10.14...BID.PRICE
ABBOT.GROUP.DEAD...07.03.08...ASK.PRICE
ABBOT.GROUP.DEAD...07.03.08...BID.PRICE
ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...ASK.PRICE
ABERDEEN.ASSET.MAN..FULLY.PAID.23.09.05...BID.PRICE
ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...ASK.PRICE
ABERDEEN.ASSET.MAN..NIL.PAID.23.09.05...BID.PRICE
ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...ASK.PRICE
ABERDEEN.FTBL.CLUB.DEAD...DEAD.04.08.03...BID.PRICE
ABERTIS..IRS....BID.PRICE
ABGENIX..IRS..DEAD...12.11.07...ASK.PRICE
ABGENIX..IRS..DEAD...12.11.07...BID.PRICE
ABLON.GROUP.DEAD...31.05.13...ASK.PRICE
ABLON.GROUP.DEAD...31.05.13...BID.PRICE
ACAMBIS.DEAD...25.09.08...ASK.PRICE
ACAMBIS.DEAD...25.09.08...BID.PRICE"))

## remove (ASK|BID).PRICE
cn.sub <- gsub("(ASK|BID)\\.PRICE$", "", cn)

cn[!(duplicated(cn.sub) | rev(duplicated(rev(cn.sub))))]
# [1] "ABERTIS..IRS....BID.PRICE"

Upvotes: 2

Related Questions