shahzadalam
shahzadalam

Reputation: 29

Python code was working fine yesterday, now getting this "IndexError: list index out of range"

I am trying to scrape data from websites

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "https://www.mohfw.gov.in/"

r = requests.get(url)

html =r.text
soup = BeautifulSoup(html,'html.parser')
#print(soup)

id = soup.find('div',id='cases')

table_body = id.find('tbody')
table_rows = table_body.find_all('tr')

sl_no = []
States = []
Cases = []
Recovered = []
Deaths = []

Trying to loop and add table rows to the above blank columns but getting errors

for tr in table_rows:
    td = tr.find_all('td')
    sl_no.append(td[0].text)
    States.append(td[1].text)
    Cases.append(td[2].text)
    Recovered.append(td[3].text)
    Deaths.append(td[-1].text)


headers = ['sl_no','States','Cases','Recovered','Deaths']
df = pd.DataFrame(list(zip(sl_no,States,Cases,Recovered,Deaths)),columns=headers)
df1 = df.drop(index=27)

This is my error

States.append(td[1].text)
IndexError: list index out of range

Upvotes: 0

Views: 58

Answers (2)

Ido
Ido

Reputation: 138

It's seems that one of the <tr> does not contain all the <td>s that you thought it should.

From a quick look on the data itself, it's seems to be that the last <tr> of that data, contains some kind of a summary for all the states. In that case you should probably cut the last <td> off your for loop:

for tr in table_rows[:-1]

Or wrap it with:

for tr in table_rows:
try:
    td = tr.find_all('td')
    sl_no.append(td[0].text)
    States.append(td[1].text)
    Cases.append(td[2].text)
    Recovered.append(td[3].text)
    Deaths.append(td[-1].text)
except Exception as e:
    # Pass or handle the exception as you wish.
    pass 

Upvotes: 0

jezrael
jezrael

Reputation: 862651

You can test lengths of td lists, problem is last is length 1, so error raise for select second value of list by td[1]:

for tr in table_rows:
    td = tr.find_all('td')
    print (len(td))
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
4
1

So your solution should be changed with filtering all td values with length 5:

for tr in table_rows:
    td = tr.find_all('td')
    if len(td) == 5:
        sl_no.append(td[0].text)
        States.append(td[1].text)
        Cases.append(td[2].text)
        Recovered.append(td[3].text)
        Deaths.append(td[-1].text)

headers = ['sl_no','States','Cases','Recovered','Deaths']
df = pd.DataFrame(list(zip(sl_no,States,Cases,Recovered,Deaths)),columns=headers)

print (df)
   sl_no                       States Cases Recovered Deaths
0      1               Andhra Pradesh    23         1      0
1      2  Andaman and Nicobar Islands     9         0      0
2      3                        Bihar    15         0      1
3      4                   Chandigarh     8         0      0
4      5                 Chhattisgarh     7         0      0
5      6                        Delhi    87         6      2
6      7                          Goa     5         0      0
7      8                      Gujarat    69         1      6
8      9                      Haryana    36        18      0
9     10             Himachal Pradesh     3         0      1
10    11            Jammu and Kashmir    48         2      2
11    12                    Karnataka    83         5      3
12    13                       Kerala   202        19      1
13    14                       Ladakh    13         3      0
14    15               Madhya Pradesh    47         0      3
15    16                  Maharashtra   198        25      8
16    17                      Manipur     1         0      0
17    18                      Mizoram     1         0      0
18    19                       Odisha     3         0      0
19    20                   Puducherry     1         0      0
20    21                       Punjab    38         1      1
21    22                    Rajasthan    59         3      0
22    23                   Tamil Nadu    67         4      1
23    24                    Telengana    71         1      1
24    25                  Uttarakhand     7         2      0
25    26                Uttar Pradesh    82        11      0
26    27                  West Bengal    22         0      2

I think you can simplify your code with read_html:

url = "https://www.mohfw.gov.in/"
df = pd.read_html(url)[-1]

And then remove last 2 rows:

df = df.iloc[:-2]

print (df)

   S. No.           Name of State / UT Total Confirmed cases *  \
0       1               Andhra Pradesh                      23   
1       2  Andaman and Nicobar Islands                       9   
2       3                        Bihar                      15   
3       4                   Chandigarh                       8   
4       5                 Chhattisgarh                       7   
5       6                        Delhi                      87   
6       7                          Goa                       5   
7       8                      Gujarat                      69   
8       9                      Haryana                      36   
9      10             Himachal Pradesh                       3   
10     11            Jammu and Kashmir                      48   
11     12                    Karnataka                      83   
12     13                       Kerala                     202   
13     14                       Ladakh                      13   
14     15               Madhya Pradesh                      47   
15     16                  Maharashtra                     198   
16     17                      Manipur                       1   
17     18                      Mizoram                       1   
18     19                       Odisha                       3   
19     20                   Puducherry                       1   
20     21                       Punjab                      38   
21     22                    Rajasthan                      59   
22     23                   Tamil Nadu                      67   
23     24                    Telengana                      71   
24     25                  Uttarakhand                       7   
25     26                Uttar Pradesh                      82   
26     27                  West Bengal                      22   

   Cured/Discharged/Migrated Death  
0                          1     0  
1                          0     0  
2                          0     1  
3                          0     0  
4                          0     0  
5                          6     2  
6                          0     0  
7                          1     6  
8                         18     0  
9                          0     1  
10                         2     2  
11                         5     3  
12                        19     1  
13                         3     0  
14                         0     3  
15                        25     8  
16                         0     0  
17                         0     0  
18                         0     0  
19                         0     0  
20                         1     1  
21                         3     0  
22                         4     1  
23                         1     1  
24                         2     0  
25                        11     0  
26                         0     2  

Upvotes: 1

Related Questions