Reputation: 329
Can someone tell me a way to add data into pandas dataframe in python while multiple threads are going to use a function in which data has to be appended into a dataframe...?
My code scrapes data from a URL and then i was using df.loc[index]... to add the scrapped row into the dataframe.
Since I've started a multi thread which basically assigns each URL to each thread. So in short many pages are being scraped at once...
How do I append those rows into the dataframe?
Upvotes: 8
Views: 15262
Reputation: 12039
Adding rows to dataframes one-by-one is not recommended. I suggest you build your data in lists, then combine those lists at the end, and then only call the DataFrame constructor once at the end on the full data set.
Example:
# help from http://stackoverflow.com/a/28463266/3393459
# and http://stackoverflow.com/a/2846697/3393459
from multiprocessing.dummy import Pool as ThreadPool
import requests
import pandas as pd
pool = ThreadPool(4)
# called by each thread
def get_web_data(url):
return {'col1': 'something', 'request_data': requests.get(url).text}
urls = ["http://google.com", "http://yahoo.com"]
results = pool.map(get_web_data, urls)
print results
print pd.DataFrame(results)
Upvotes: 8