sunspots
sunspots

Reputation: 1057

Export multiple html tables to Excel with Pandas as the Middleman

I'm collecting data over the years 1981-2018 from a website, whereby the this link shows the 2018 data:

If one changes 2018 to a year from 1981-2018 in the aforementioned link one obtains the remaining dataset.

Using Pandas and urllib.request I collect the data as follows:

url = ['ftp://ftp.cpc.ncep.noaa.gov/htdocs/degree_days/weighted/daily_data/' + \
   '{}'.format(i) + '/Population.Heating.txt' for i in range(1981,2019)]
data_url = [pd.read_csv(url[i], sep=" ", header=None) for i in range(len(url))]

Questions

First, is there a cleaner and more efficient way of collecting the data from the links than the above list comprehension? Second, how would I export the entire list comprehension to an Excel spreadsheet?

I had tried the following method for exporting, however; the code only exported the year 2018:

from pandas import ExcelWriter

writer = ExcelWriter('PythonExport.xlsx')
for i in range(len(data_url)):
    data_url[i].to_excel(writer,'Sheet1')
writer.save()

To address the question of why I didn't directly import the data to Excel: Ultimately, I would like to have the data for each year in a DataFrame, namely one column contains the Region' data and the other column contains the 'Conus' data. In trying to construct this DataFrame it seemed to be easier to munge the data in Excel than working with the list comprehension data_url from above, then use the data to build the DataFrame.

Upvotes: 1

Views: 892

Answers (1)

Stephen Rauch
Stephen Rauch

Reputation: 49794

Here is a way to parse that data into a single dataframe:

Code:

url = [
    'ftp://ftp.cpc.ncep.noaa.gov/htdocs/degree_days/weighted/daily_data/'
    '{}'.format(i) + '/Population.Heating.txt' for i in range(1981, 2018)]
data_url = [pd.read_csv(url[i], sep="|", skiprows=3, index_col=0).T
            for i in range(len(url))]
df = pd.concat(data_url)

print(df.head())
print(df.tail())

Results:

Region     1   2   3   4   5   6   7   8   9  CONUS
19810101  51  45  36  33  24  24  14  22  14     28
19810102  46  42  43  40  23  29  17  22  16     29
19810103  55  50  51  46  26  28  17  23  14     33
19810104  66  59  62  55  27  30  18  23  15     37
19810105  62  56  59  47  34  42  22  24  14     38

Region     1   2   3   4   5   6   7   8   9  CONUS
20171227  53  49  62  64  22  35  28  29  15     37
20171228  59  54  60  57  27  37  28  26  13     38
20171229  59  53  54  54  26  33  23  24  11     35
20171230  57  50  54  62  24  32  19  27  12     34
20171231  59  55  60  68  29  39  27  30  15     40

Upvotes: 1

Related Questions