Reputation: 1508
I am scraping data from a website using this code and loading the data to a pandas dataframe. I get multiple entries per iteration:
data = []
for i in range (0,24):
for j in range (1,15):
if i < 9:
URL = 'https://www.weltfussball.de/spielerliste/bundesliga-200' + str(i) + '-' + '200' + str(i+1)
URL_ = URL + '/nach-name/'+ str(j) + '/'
response = requests.get(URL_,headers={'User-Agent': 'Mozilla/5.0'})
data.append(pd.read_html(response.text)[1])
df = pd.concat(data).reset_index()
In order to identify the iteration I want to append/add to each row of the dataframe a column with the corresponding iterator i
. So, for the entries of the first iteration 0, then 1 and so on. How do I have to amend my code?
Upvotes: 1
Views: 52
Reputation: 71689
Instead of using list to append dataframes, use a dictionary to store the dataframes for each iteration (i, j)
then concat
will automatically take care of adding the multiindex for you.
data = {}
for i in range (0,2):
for j in range (1,3):
if i < 9:
...
data[(i, j)] = pd.read_html(response.text)[1]
df = pd.concat(data).reset_index(level=2, drop=True)
Upvotes: 1