azrosen92
azrosen92

Reputation: 9117

Historical weather data from NOAA

I am working on a data mining project and I would like to gather historical weather data. I am able to get historical data through the web interface that they provide at http://www.ncdc.noaa.gov/cdo-web/search. But I would like to access this data programmatically through an API. From what I have been reading on StackOverflow this data is supposed to be public domain, but the only place I have been able to find it is on non-free services like Wunderground. How can I access this data for free?

Upvotes: 10

Views: 13465

Answers (3)

Anil Kumar
Anil Kumar

Reputation: 445

Dependencies

  1. pip install selenium
  2. download chrome driver('chromedriver.exe') #For Windows OS https://chromedriver.storage.googleapis.com/114.0.5735.90/chromedriver_win32.zip

Once the drivers and libraries are downloaded, we need to find out the codes for required locations by clicking on the map. (Source website: https://www.weather.gov/wrh/climate)

#Keys for required states

# RECAP NAME                   CLICK ON MAP                SELECT UNDER 1. LOCATION
# Dallas                       Fort Worth (fwd)               Dallas Area
# Florida                      Miami  (mfl)                   Miami Area
# New York                     New York  (okx)                NY-Central Park Area
# Minneapolis                  Minneapolis (mpx)              Minneapolis Area
# California                   Los Angeles(lox)               LA Downtown Area

state_code_dict = {'Dallas':['fwd',3],'Florida':['mfl',1],
                   'New York':['okx',24],'Minneapolis':['mpx',1],
                   'California':['lox',2]}

The numbers in the state_code_dict are the location of required area in the given dropdown. for ex: for Florida, the code is 'mfl', in Florida Miami area is present in the 1st in the dropdown list.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

options = Options()
options.add_argument("start-maximized")

webdriver_service = Service('chromedriver.exe')

df_ = pd.DataFrame() #(columns = ['Date','Average','Recap_name'])
for i in state_code_dict.keys():
    
    #Load the driver with webpage
    driver = webdriver.Chrome(options=options, service=webdriver_service)
    wait = WebDriverWait(driver, 30)
    print("Running for: ",i)
    ## Below url redirects to the data page
    ## source site is (https://www.weather.gov/wrh/climate)
    url = "https://nowdata.rcc-acis.org/" + state_code_dict[i][0] + "/"
    select_location = "/html/body/div[1]/div[3]/select/option[" + str(state_code_dict[i][1]) + "]"
    select_date = "tDatepicker"
    
    ## Give desired date/month in 'yyyy-mm' format, as it pulls the complete month data at once.
    set_date = "'2023-07'"
    date_freeze = "arguments[0].value = "+ set_date
    
    #X_PATH of go button to click for next window to open. X_PATH can be found from inspect element in chrome.
    click_go = "//*[@id='go']"
    wait_table_span = "//*[@id='results_area']/table[1]/caption/span"
    enlarge_click = "/html/body/div[5]/div[1]/button[1]"
    
    #Get the temprature table from the appearing html using below X_Path 
    get_table = '//*[@id="results_area"]'
    try:
        driver.get(url)
        # wait 10 seconds before looking for element
        element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,select_location)))
        element.click()
        element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.ID,select_date)))
        driver.execute_script(date_freeze, element)
        element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,click_go)))
        element.click()
        element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,wait_table_span)))
        element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,enlarge_click)))
        element.click()
        data = driver.find_element(By.XPATH,get_table).get_attribute("innerHTML")
        df = pd.read_html(data)
        df[0].columns = df[0].columns.droplevel(0)
        df_all = df[0][['Date','Average']] 
        df_all['Recap_name'] = i
    finally:
        driver.quit()
    df_ = df_.append(df_all)
    
## Write different states data to different sheets in excel    
with pd.ExcelWriter("avg_temp.xlsx") as writer:
    for i in state_code_dict.keys():
        df_write = df_[df_.Recap_name == i]
        df_write.to_excel(writer, sheet_name=i, index=False)
    print("--------Finished----------")

Upvotes: 0

Capacytron
Capacytron

Reputation: 3739

As far as I know, all NOAA historical weather data is available for free through the upgini python library: https://upgini.com

However, you will not be able to download this data if you do not have the task of training the ML algorithm. A feature of upgini is the enrichment of dataframes with only relevant columns with data. Relevance in this case is understood as the significance of a data column (for example, temperature) for the predicting of some target event.

If you have such task try to run data enrichment with upgini to get NOAA historical weather data for free:

%pip install upgini

from upgini import FeaturesEnricher, SearchKey
enricher = FeaturesEnricher (search_keys={'rep_date': SearchKey.DATE, 'country': SearchKey.COUNTRY, 'postal_code': SearchKey.POSTAL_CODE})
enricher.fit(X_train, Y_train)

Upvotes: 0

Brian
Brian

Reputation: 461

For a list of all service APIs provided by the National Climatic Data Center: http://www.ncdc.noaa.gov/cdo-web/webservices

Full documentation to the API which backs the search page you listed: http://www.ncdc.noaa.gov/cdo-web/webservices/v2

Requires a token, and limits to 1000 requests per day. If you need the limit increased for legitimate reasons contact http://www.ncdc.noaa.gov/customer-support.

Also, for bulk downloading use ftp: ftp://ftp.ncdc.noaa.gov/pub/data/

Upvotes: 10

Related Questions