Stuck importing NetCDF file into Pandas DataFrame

Question

I've been working on this as a beginner for a while. Overall, I want to read in a NetCDF file and import multiple (~50) columns (and 17520 cases) into a Pandas DataFrame. At the moment I have set it up for a list of 4 variables but I want to be able to expand that somehow. I made a start, but any help on how to loop through to make this happen with 50 variables would be great. It does work using the code below for 4 variables. I know its not pretty - still learning!

Another question I have it that when I try to read the numpy arrays directly into Pandas DataFrame it doesn't work and instead creates a DataFrame that is 17520 columns large. It should be the other way (transposed). If I create a series, it works fine. So I have had to use the following lines to get around this. Not even sure why it works. Any suggestions of a better way (especially when it comes to 50 variables)?

d={vnames[0] :vartemp[0], vnames[1] :vartemp[1], vnames[2] :vartemp[2], vnames[3] :vartemp[3]}
hs = pd.DataFrame(d,index=times)

The whole code is pasted below:

import pandas as pd
import datetime as dt
import xlrd
import numpy as np
import netCDF4


def excel_to_pydate(exceldate):
    datemode=0           # datemode: 0 for 1900-based, 1 for 1904-based
    pyear, pmonth, pday, phour, pminute, psecond = xlrd.xldate_as_tuple(exceldate, datemode)
    py_date = dt.datetime(pyear, pmonth, pday, phour, pminute, psecond)
    return(py_date)

def main():
    filename='HowardSprings_2010_L4.nc'
#Define a list of variables names we want from the netcdf file
    vnames = ['xlDateTime', 'Fa', 'Fh' ,'Fg']

# Open the NetCDF file
    nc = netCDF4.Dataset(filename) 

#Create some lists of size equal to length of vnames list.
    temp=list(xrange(len(vnames)))
    vartemp=list(xrange(len(vnames)))

#Enumerate the list and assign each NetCDF variable to an element in the lists.  
# First get the netcdf variable object assign to temp
# Then strip the data  from that and add to temporary variable (vartemp)
    for index, variable in enumerate(vnames):               
        temp[index]= nc.variables[variable]
        vartemp[index] = temp[index][:]   

# Now call the function to convert to datetime from excel. Assume datemode: 0
    times = [excel_to_pydate(elem) for elem in vartemp[0]]

#Dont know why I cant just pass a list of variables i.e. [vartemp[0], vartemp[1], vartemp[2]]
#But this is only thing that worked
#Create Pandas dataframe using times as index
    d={vnames[0] :vartemp[0], vnames[1] :vartemp[1], vnames[2] :vartemp[2], vnames[3] :vartemp[3]}
    theDataFrame = pd.DataFrame(d,index=times)

#Define missing data value and apply to DataFrame
    missing=-9999
    theDataFrame1=theDataFrame.replace({vnames[0] :missing, vnames[1] :missing, vnames[2] :missing, vnames[3] :missing},'NaN')

main()

Stuck importing NetCDF file into Pandas DataFrame

Answers (1)

Related Questions