Stephen Ellwood
Stephen Ellwood

Reputation: 424

xlrd named range example?

I have an excel spreadsheet that I am trying to parse with xlrd. The spreadsheet itself makes extensive use of named ranges.

If I use:

for name in book.name_map:
    print(name)

I can see all of the names are there.

However I can't make any of the methods work (cell method and area2d). Can anyone give me an example of the syntax to be able to read the cell range that a name is pointing to given the name.

The Excel file is an XLSM file with lots of visual basic that also operates on these named ranges.

Upvotes: 0

Views: 993

Answers (1)

Stephen Ellwood
Stephen Ellwood

Reputation: 424

I think that the naming support in XLRD is broken for XLSM files but I found an answer by switching to openpyxl. This has a function get_named_ranges() which contains all of the named ranges. The support after that is a bit thin so I wrote my own class to turn the named ranges in my spreadsheet into a class where I can access the same information using the same names. # -- coding: utf-8 -- """ Created on Wed Sep 14 09:42:09 2016

@author: ellwood
"""

from openpyxl import load_workbook

class NamedArray(object):

    ''' Named range object
    '''
    C_CAPS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'   

    def __init__(self, workbook, named_range_raw):
        ''' Initialise a NameArray object from the named_range_raw information in the given
            workbook
        '''
        self.wb = workbook
        sheet_str, cellrange_str = str(named_range_raw).split('!')
        self.sheet =sheet_str.split('"')[1]
        self.loc = self.wb[self.sheet]
        if ':' in cellrange_str:
            self.has_range = True
            self.has_value = False
            lo,hi = cellrange_str.split(':')
            self.ad_lo = lo.replace('$','')
            self.ad_hi = hi.replace('$','')
        else:
            self.has_range = False
            self.has_value = True
            self.ad_lo = cellrange_str.replace('$','')
            self.ad_hi = self.ad_lo

        self.min_row = self.get_row(self.ad_lo) 
        self.max_row = self.get_row(self.ad_hi)
        self.rows = self.max_row - self.min_row + 1
        self.min_col = self.col_to_n(self.ad_lo)
        self.max_col = self.col_to_n(self.ad_hi)
        self.cols    = self.max_col - self.min_col + 1

    def size_of(self):
        ''' Returns two dimensional size of named space
        '''
        return self.cols, self.rows 

    def cols(self):
        ''' Returns number of cols in named space
        '''
        return self.cols

    def rows(self):
        ''' Returns number of rows in named space
        '''
        return self.rows

    def value(self, r=1, c=1):
       ''' Returns the value at row r, column c
       '''
       if self.has_value:
          return self.loc.cell(self.ad_lo).value

       assert r < self.max_rows
       assert c < self.max_cols
       return self.loc.cell(self.n_to_col(self.min_col + c-1)+str(self.min_row+r-1)).value        

    def is_range(self):
       ''' if true then name defines a table more than 1 cell
       '''
       return self.range

    def is_value(self):
        ''' if true then name defines the location of a single value
        '''
        return None

    def __str__(self):
        ''' printed description of named space
        '''
        locs = 's ' + self.ad_lo + ':' + self.ad_hi if self.is_range else ' ' + self.ad_lo 
        return('named range'+ str(self.size_of()) + ' in sheet ' + self.sheet + ' @ location' + locs)  

    @classmethod
    def get_row(cls, ad):
        ''' get row number from cell string
        Cell string is assumed to be in excel format i.e "ABC123" where row is 123
        '''
        row = 0
        for l in ad:
            if l in "1234567890":
                row = row*10 + int(l)
        return row

    @classmethod
    def col_to_n(cls, ad):
        ''' find column number from xl address
            Cell string is assumed to be in excel format i.e "ABC123" where column is abc
            column number is integer represenation i.e.(A-A)*26*26 + (B-A)*26 + (C-A)
        '''
        n = 0
        for l in ad:
            if l in cls.C_CAPS:
                n = n*26 + cls.C_CAPS.find(l)+1
        return n

    @classmethod
    def n_to_col(cls,n):
        ''' make xl column address from column number
        '''
        ad = ''
        while n > 0:
            ad = cls.C_CAPS[n%26-1] + ad  
            n = n // 26
        return ad


class Struct(object):
    ''' clast which turns a dictionary into a structure
    '''
    def __init__(self, **entries): 
        self.__dict__.update(entries)

    def repr__(self): 
        return '<%s>' % str('\n '.join('%s : %s' % (k, repr(v)) for (k, v) in self.__dict.iteritems())) 


def get_names(workbook):
    ''' Get a structure containing all of the names in the workbook
    '''
    named_ranges = wb.get_named_ranges()    
    name_list = {}
    for named_range in named_ranges:
        name = named_range.name
        if name[0:2] == 'n_':
            # only store the names beginning with 'n_'
            name_list[name[2:]] = NamedArray(wb, str(named_range))
    for item in name_list:
        print (item, '=', name_list[item])
    return Struct(**name_list)

# ------------------
# program example
# -----------------        

wb = load_workbook('test.xlsm', data_only=True) 

n = get_names(wb)
print(n.my_name.value())

One Small optimisation is that I prefixed all of the names I was interested in importing wiht 'n_' so I could then ignore any built in Excel names. I hope this is useful to someone.

Upvotes: 1

Related Questions