adc admin
adc admin

Reputation: 23

scraping table from web to excel using python pandas, storing numbers as text in excel. How to store as Value?

I am trying to scrape a table from a website using pandas. The code is shown below:

import pandas as pd

url = "http://mnregaweb4.nic.in/netnrega/state_html/empstatusnewall_scst.aspx?page=S&lflag=eng&state_name=KERALA&state_code=16&fin_year=2020-2021&source=national&Digest=s5wXOIOkT98cNVkcwF6NQA"

df1 = pd.read_html(url)[3]

df1.to_excel("combinedGP.xlsx", index=False)

In the resulting excel file, the numbers are saved as text. Since I am planning to build a file with around 1000 rows, I cannot manually change the data type. So is there another way to store them as actual values and not text? TIA

Upvotes: 0

Views: 220

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31166

The website can be very unresponsive...

  • there are unwanted header rows, and two rows of column headers
  • simple way to manage this is to_csv(), from_csv() with appropriate parameters.
import pandas as pd 
import io
url = "http://mnregaweb4.nic.in/netnrega/state_html/empstatusnewall_scst.aspx?page=S&lflag=eng&state_name=KERALA&state_code=16&fin_year=2020-2021&source=national&Digest=s5wXOIOkT98cNVkcwF6NQA" 
df1 = pd.read_html(url)[3] 

df1 = pd.read_csv(io.StringIO(df1.to_csv(index=False)), skiprows=3, header=[0,1])
# df1.to_excel("combinedGP.xlsx", index=False)

sample after cleaning up

  S.No   District HH issued jobcards                       No. of HH Provided Employment                       EMP. Provided No. of Persondays generated                                    Families Completed 100 Days                   
  S.No   District                SCs   STs  Others   Total                           SCs   STs  Others   Total  No. of Women                         SCs     STs   Others    Total    Women                         SCs  STs Others  Total
0  1.0  ALAPPUZHA              32555   760  254085  287400                         20237   565  132744  153546        157490                     1104492   40209  6875586  8020287  7635748                        1346  148   5840   7334
1  2.0  ERNAKULAM              36907  2529  212534  251970                         15500  1517   68539   85556         82270                      908035  104040  3788792  4800867  4467329                        2848  301  11953  15102

Upvotes: 2

Related Questions