Reputation: 1988
I'm working with a dataset where the names of companies don't quite match the dataset I'd like to merge with.
As part of the solution, I'd like to edit company names down a column in my data frame. Assuming I a list of company names, what I'd like to do is this:
diff = ['some list of tickers']
for security in df.query("tic in @diff").security.unique():
result = re.search(expression, security)[0].upper()
result = result.replace('CORPORATION', 'CORP')
result = result.replace('COMPANY', 'CO')
result = result.replace('OF','')
result = result.replace('F', '')
result = result.strip()
I think there might even an elegant solution where I can run a custom function for modifying the value of a cell, but I'm not familiar enough with pandas to know how to do that yet.
The dataframe I'm editing on loks like this.
caldt tic security curshrt
2672 1988-06-01 ITSCW Industrial Training Systems, 0
8007 1988-07-01 ITSCW Industrial Training Systems, 0
44772 1989-03-01 MMNT Momentum, Inc. 0
49865 1989-04-01 MMNT Momentum, Inc. 372
54925 1989-05-01 MMNT Momentum, Inc. 78
... ... ... ... ...
1077214 2007-06-01 ABBI Abraxis BioScience, Inc. - Common stock 4486255
1080530 2007-07-01 ABBI Abraxis BioScience, Inc. - Common stock 4659919
1083835 2007-08-01 ABBI Abraxis BioScience, Inc. - Common stock 4265095
1087148 2007-09-01 ABBI Abraxis BioScience, Inc. - Common stock 4018741
1090424 2007-10-01 ABBI Abraxis BioScience, Inc. - Common stock 4064510
I'm getting rid of punctuation and changing some of the abbreviations and words
Upvotes: 1
Views: 71
Reputation: 25259
I think your codes may be modified as follows:
import re
diff = ['some list of tickers']
d = dict([('CORPORATION', 'CORP'), ('COMPANY', 'CO'), ('OF',''), ('F', '')])
s = df.loc[df.tic.isin(diff), 'security']
df.loc[df.tic.isin(diff), 'security'] = (s.str.findall(expression, flags=re.I).str[0].
str.upper().replace(d))
Note: expression
is your search pattern.
Upvotes: 2