Reputation: 299
I have no idea what I'm doing, but the goal of this code is to scrape all link hrefs from several pages (did not include pagination code on purpose) and store them in pandas DF. I would like to print all of the rows from df2 once the loop has finished. the "for i in range(0,10)" only runs the loop 10 and appends the links 10 times.
How do I code it so that it continue appending all the links (not limited to 10)? Sorry for being a newbie.
for linkurl in linkcontainer:
link = linkurl.find_element_by_xpath('.//div[2]/div/div/span/a').get_attribute("href")
df_links = pd.DataFrame([[link]], columns=['link'])
df2 = pd.DataFrame()
for i in range(0,10):
df2 = df2.append(df_links)
/// loop breaks here when it paginates through all pages ///
print(df2.link.to_string(index=False, header=False))
Upvotes: 0
Views: 170
Reputation: 21
What you are doing is over-writing your dataframe with every loop, you need to store it to some sort of list or dictionary, for example :
links = []
for linkurl in linkcontainer:
link = linkurl.find_element_by_xpath('.//div[2]/div/div/span/a').get_attribute("href")
links.append(link)
# loop breaks here when it paginates through all pages
df2 = pd.DataFrame({'links' : links})
depending on your IDE you can print your rows in a number of ways,
simple call print(df2)
or if you truly want to iterate over your dataframe
for index, row in df2.iterrows():
print(row)
Upvotes: 1
Reputation: 590
Iterrows
would do it.
For ind, row in df_links.iterrows():
df2.loc[len(df2), :] = row
Upvotes: 0