Creating Pandas DataFrame from SmartSheet API (nested, awkward, JSON)

Question

I'm trying to connect to my office's SmartSheet API via Python to create some performance tracking dashboards that utilize data outside of SmartSheet. All I want to do is create a simple DataFrame where fields reflect columnId and cell values reflect the displayValue key in the Smartsheet dictionary. I am doing this using a standard API requests.get rather than SmartSheet's API documentation because I've found the latter less easy to work with.

The table (sample) is set up as:

Number  Letter  Name
1       A       Joe
2       B       Jim
3       C       Jon

The JSON syntax from the sheet GET request is:

{'id': 339338304219012,
 'name': 'Sample Smartsheet',
 'version': 1,
 'totalRowCount': 3,
 'accessLevel': 'OWNER',
 'effectiveAttachmentOptions': ['GOOGLE_DRIVE',
  'EVERNOTE',
  'DROPBOX',
  'ONEDRIVE',
  'LINK',
  'FILE',
  'BOX_COM',
  'EGNYTE'],
 'ganttEnabled': False,
 'dependenciesEnabled': False,
 'resourceManagementEnabled': False,
 'cellImageUploadEnabled': True,
 'userSettings': {'criticalPathEnabled': False, 'displaySummaryTasks': True},
 'userPermissions': {'summaryPermissions': 'ADMIN'},
 'hasSummaryFields': False,
 'permalink': 'https://app.smartsheet.com/sheets/5vxMCJQhMV7VFFPMVfJgg2hX79rj3fXgVGG8fp61',
 'createdAt': '2020-02-13T16:32:02Z',
 'modifiedAt': '2020-02-14T13:15:18Z',
 'isMultiPicklistEnabled': True,
 'columns': [{'id': 6273865019090820,
   'version': 0,
   'index': 0,
   'title': 'Number',
   'type': 'TEXT_NUMBER',
   'primary': True,
   'validation': False,
   'width': 150},
  {'id': 4022065205405572,
   'version': 0,
   'index': 1,
   'title': 'Letter',
   'type': 'TEXT_NUMBER',
   'validation': False,
   'width': 150},
  {'id': 8525664832776068,
   'version': 0,
   'index': 2,
   'title': 'Name',
   'type': 'TEXT_NUMBER',
   'validation': False,
   'width': 150}],
 'rows': [{'id': 8660990817003396,
   'rowNumber': 1,
   'expanded': True,
   'createdAt': '2020-02-14T13:15:18Z',
   'modifiedAt': '2020-02-14T13:15:18Z',
   'cells': [{'columnId': 6273865019090820, 'value': 1.0, 'displayValue': '1'},
    {'columnId': 4022065205405572, 'value': 'A', 'displayValue': 'A'},
    {'columnId': 8525664832776068, 'value': 'Joe', 'displayValue': 'Joe'}]},
  {'id': 498216492394372,
   'rowNumber': 2,
   'siblingId': 8660990817003396,
   'expanded': True,
   'createdAt': '2020-02-14T13:15:18Z',
   'modifiedAt': '2020-02-14T13:15:18Z',
   'cells': [{'columnId': 6273865019090820, 'value': 2.0, 'displayValue': '2'},
    {'columnId': 4022065205405572, 'value': 'B', 'displayValue': 'B'},
    {'columnId': 8525664832776068, 'value': 'Jim', 'displayValue': 'Jim'}]},
  {'id': 5001816119764868,
   'rowNumber': 3,
   'siblingId': 498216492394372,
   'expanded': True,
   'createdAt': '2020-02-14T13:15:18Z',
   'modifiedAt': '2020-02-14T13:15:18Z',
   'cells': [{'columnId': 6273865019090820, 'value': 3.0, 'displayValue': '3'},
    {'columnId': 4022065205405572, 'value': 'C', 'displayValue': 'C'},
    {'columnId': 8525664832776068, 'value': 'Jon', 'displayValue': 'Jon'}]}]}

Here are the two ways I've approached the problem:

INPUT:

from pandas.io.json import json_normalize
samplej = sample.json()
s_rows = json_normalize(data=samplej['rows'], record_path='cells', meta=['id', 'rowNumber'])
s_rows

OUTPUT:

DataFrame with columnId, value, disdlayValue, id, and rowNumber as their own fields.

If I could figure out how to transpose this data in the right way I could probably make it work, but that seems incredibly complicated.

INPUT:

samplej = sample.json()
cellist = []
def get_cells():
    srows = samplej['rows']
    for s_cells in srows:
        scells = s_cells['cells']
        cellist.append(scells)
get_cells()
celldf = pd.DataFrame(cellist)
celldf

OUTPUT:

This returns a DataFrame with the correct number of columns and rows, but each cell is populated with a dictionary that looks like

In [14]:
celldf.loc[1,1]
Out [14]:
{'columnId': 4022065205405572, 'value': 'B', 'displayValue': 'B'}

If there was a way to remove everything except the value corresponding to the displayValue key in every cell, this would probably solve my problem. Again, though, it seems weirdly complicated.

I'm fairly new to Python and working with API's, so there may be a simple way to address the problem I'm overlooking. Or, if you have a suggestion for approaching the possible solutions I outlined above I'm all ears. Thanks for your help!

Serge Ballesta · Accepted Answer

You must make use of the columns field:

colnames = {x['id']: x['title'] for x in samplej['columns']}
columns = [x['title'] for x in samplej['columns']]
cellist = [{colnames[scells['columnId']]: scells['displayValue']
            for scells in s_cells['cells']} for s_cells in samplej['rows']]
celldf = pd.DataFrame(cellist, columns=columns)

This gives as expected:

  Number Letter Name
0      1      A  Joe
1      2      B  Jim
2      3      C  Jon

If some cells could contain only a columnId but no displayValue field, scells['displayValue'] should be replaced in above code with scells.get('displayValue', defaultValue), where defaultValue could be None, np.nan or any other relevant default.

Creating Pandas DataFrame from SmartSheet API (nested, awkward, JSON)

Answers (1)

Related Questions