Bronson77
Bronson77

Reputation: 299

Printing Output Iterating Over DF Rows

I have no idea what I'm doing, but the goal of this code is to scrape all link hrefs from several pages (did not include pagination code on purpose) and store them in pandas DF. I would like to print all of the rows from df2 once the loop has finished. the "for i in range(0,10)" only runs the loop 10 and appends the links 10 times.

How do I code it so that it continue appending all the links (not limited to 10)? Sorry for being a newbie.

for linkurl in linkcontainer:
    link = linkurl.find_element_by_xpath('.//div[2]/div/div/span/a').get_attribute("href")

    df_links = pd.DataFrame([[link]], columns=['link'])
    df2 = pd.DataFrame()
    for i in range(0,10):
        df2 = df2.append(df_links)

/// loop breaks here when it paginates through all pages ///

print(df2.link.to_string(index=False, header=False))

Upvotes: 0

Views: 170

Answers (2)

Zara Khan
Zara Khan

Reputation: 21

What you are doing is over-writing your dataframe with every loop, you need to store it to some sort of list or dictionary, for example :

links = []
    for linkurl in linkcontainer:
        link = linkurl.find_element_by_xpath('.//div[2]/div/div/span/a').get_attribute("href")

        links.append(link)

# loop breaks here when it paginates through all pages 
df2 = pd.DataFrame({'links' : links})

depending on your IDE you can print your rows in a number of ways,

simple call print(df2) or if you truly want to iterate over your dataframe

for index, row in df2.iterrows():
    print(row)

Upvotes: 1

AlecZ
AlecZ

Reputation: 590

Iterrows would do it.

For ind, row in df_links.iterrows():
    df2.loc[len(df2), :] = row

Upvotes: 0

Related Questions