W.R
W.R

Reputation: 187

nested for loops with pandas dataframe

I am looping through a dataframe column of headlines (sp500news) and comparing against a dataframe of company names (co_names_df). I am trying to update the frequency each time a company name appears in a headline.

My current code is below and is not updating the frequency columns. Is there a cleaner, faster implementation - maybe without the for loops?

for title in sp500news['title']:
    for string in title:
        for co_name in co_names_df['Name']:
            if string == co_name:
                co_names_index = co_names_df.loc[co_names_df['Name']=='string'].index
                co_names_df['Frequency'][co_names_index] += 1

co_names_df sample

    Name    Frequency
0   3M  0
1   A.O. Smith  0
2   Abbott  0
3   AbbVie  0
4   Accenture   0
5   Activision  0
6   Acuity Brands   0
7   Adobe Systems   0                 
               ...     

sp500news['title'] sample

title  
0       Italy will not dismantle Montis labour reform  minister                            
1       Exclusive US agency FinCEN rejected veterans in bid to hire lawyers                
4       Xis campaign to draw people back to graying rural China faces uphill battle        
6       Romney begins to win over conservatives                                            
8       Oregon mall shooting survivor in serious condition                                 
9       Polands PGNiG to sign another deal for LNG supplies from US CEO              

Upvotes: 3

Views: 11759

Answers (1)

Nathan
Nathan

Reputation: 10306

You can probably speed this up; you're using dataframes where other structures would work better. Here's what I would try.

from collections import Counter

counts = Counter()

# checking membership in a set is very fast (O(1))
company_names = set(co_names_df["Name"])

for title in sp500news['title']:
    for word in title: # did you mean title.split(" ")? or is title a list of strings?
        if word in company_names:
            counts.update([word])

counts is then a dictionary {company_name: count}. You can just do a quick loop over the elements to update the counts in your dataframe.

Upvotes: 1

Related Questions