Tobitor
Tobitor

Reputation: 1508

How to add the scrape iterator to a pandas dataframe for each row?

I am scraping data from a website using this code and loading the data to a pandas dataframe. I get multiple entries per iteration:

data = []
for i in range (0,24):
    for j in range (1,15):
        if i < 9:
            URL = 'https://www.weltfussball.de/spielerliste/bundesliga-200' + str(i) + '-' + '200' + str(i+1)
            URL_ = URL + '/nach-name/'+ str(j) + '/'
            response = requests.get(URL_,headers={'User-Agent': 'Mozilla/5.0'})
            data.append(pd.read_html(response.text)[1])
df = pd.concat(data).reset_index()

In order to identify the iteration I want to append/add to each row of the dataframe a column with the corresponding iterator i. So, for the entries of the first iteration 0, then 1 and so on. How do I have to amend my code?

Upvotes: 1

Views: 52

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Instead of using list to append dataframes, use a dictionary to store the dataframes for each iteration (i, j) then concat will automatically take care of adding the multiindex for you.

Update your code

data = {}
for i in range (0,2):
    for j in range (1,3):
        if i < 9:
            ...
            data[(i, j)] = pd.read_html(response.text)[1]

df = pd.concat(data).reset_index(level=2, drop=True) 

Upvotes: 1

Related Questions