Reputation: 23
I am trying to scrape a table from a website using pandas. The code is shown below:
import pandas as pd
url = "http://mnregaweb4.nic.in/netnrega/state_html/empstatusnewall_scst.aspx?page=S&lflag=eng&state_name=KERALA&state_code=16&fin_year=2020-2021&source=national&Digest=s5wXOIOkT98cNVkcwF6NQA"
df1 = pd.read_html(url)[3]
df1.to_excel("combinedGP.xlsx", index=False)
In the resulting excel file, the numbers are saved as text. Since I am planning to build a file with around 1000 rows, I cannot manually change the data type. So is there another way to store them as actual values and not text? TIA
Upvotes: 0
Views: 220
Reputation: 31166
The website can be very unresponsive...
to_csv()
, from_csv()
with appropriate parameters.import pandas as pd
import io
url = "http://mnregaweb4.nic.in/netnrega/state_html/empstatusnewall_scst.aspx?page=S&lflag=eng&state_name=KERALA&state_code=16&fin_year=2020-2021&source=national&Digest=s5wXOIOkT98cNVkcwF6NQA"
df1 = pd.read_html(url)[3]
df1 = pd.read_csv(io.StringIO(df1.to_csv(index=False)), skiprows=3, header=[0,1])
# df1.to_excel("combinedGP.xlsx", index=False)
S.No District HH issued jobcards No. of HH Provided Employment EMP. Provided No. of Persondays generated Families Completed 100 Days
S.No District SCs STs Others Total SCs STs Others Total No. of Women SCs STs Others Total Women SCs STs Others Total
0 1.0 ALAPPUZHA 32555 760 254085 287400 20237 565 132744 153546 157490 1104492 40209 6875586 8020287 7635748 1346 148 5840 7334
1 2.0 ERNAKULAM 36907 2529 212534 251970 15500 1517 68539 85556 82270 908035 104040 3788792 4800867 4467329 2848 301 11953 15102
Upvotes: 2