Reputation: 187
I am looping through a dataframe column of headlines (sp500news) and comparing against a dataframe of company names (co_names_df). I am trying to update the frequency each time a company name appears in a headline.
My current code is below and is not updating the frequency columns. Is there a cleaner, faster implementation - maybe without the for loops?
for title in sp500news['title']:
for string in title:
for co_name in co_names_df['Name']:
if string == co_name:
co_names_index = co_names_df.loc[co_names_df['Name']=='string'].index
co_names_df['Frequency'][co_names_index] += 1
co_names_df sample
Name Frequency
0 3M 0
1 A.O. Smith 0
2 Abbott 0
3 AbbVie 0
4 Accenture 0
5 Activision 0
6 Acuity Brands 0
7 Adobe Systems 0
...
sp500news['title'] sample
title
0 Italy will not dismantle Montis labour reform minister
1 Exclusive US agency FinCEN rejected veterans in bid to hire lawyers
4 Xis campaign to draw people back to graying rural China faces uphill battle
6 Romney begins to win over conservatives
8 Oregon mall shooting survivor in serious condition
9 Polands PGNiG to sign another deal for LNG supplies from US CEO
Upvotes: 3
Views: 11759
Reputation: 10306
You can probably speed this up; you're using dataframes where other structures would work better. Here's what I would try.
from collections import Counter
counts = Counter()
# checking membership in a set is very fast (O(1))
company_names = set(co_names_df["Name"])
for title in sp500news['title']:
for word in title: # did you mean title.split(" ")? or is title a list of strings?
if word in company_names:
counts.update([word])
counts
is then a dictionary {company_name: count}
. You can just do a quick loop over the elements to update the counts in your dataframe.
Upvotes: 1