Using Dash to upload a PDF and display pandas dataframe extracted from it

Question

I am using a pdf to extract some data and convert it into pandas dataframe. I'm using below code from dash tutorial to display the data on dash app. What I want to do next is be able to upload the pdf instead of predefining it in the code. I could find similar examples for csv but with pdf it doesn't work in the same way.

pdfFileObj = open('test.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
#some operations on pdf to produce df1 and df2 using PyPDF2

app = Dash(__name__)
app.layout = html.Div([
    html.H4('Some title'),
    html.P(id='table_out'),
    dash_table.DataTable(
        id='table',
        columns=[{"name": i, "id": i} 
                 for i in df1.columns],
        data=df1.to_dict('records'),
        style_cell=dict(textAlign='left'),
        style_header=dict(backgroundColor="paleturquoise"),
        style_data=dict(backgroundColor="lavender")
    ),
    html.H4("Some title"),
    html.P(id='table_out1'),
    dash_table.DataTable(
        id='table1',
        columns=[{"name": i, "id": i} 
                 for i in df2.columns],
        data=df2.to_dict('records'),
        style_cell=dict(textAlign='left'),
        style_header=dict(backgroundColor="paleturquoise"),
        style_data=dict(backgroundColor="lavender")
    )
])

@app.callback(
    Output('table_out', 'children'), 
    Input('table', 'active_cell'))
    
@app.callback(
    Output('table_out1', 'children'), 
    Input('table1', 'active_cell'))
    
def update_graphs(active_cell):
    if active_cell:
        cell_data = df1.iloc[active_cell['row']][active_cell['column_id']]
        cell_data2 = df2.iloc[active_cell['row']][active_cell['column_id']]
        return cell_data, cell_data2

        #return f"Data: \"{cell_data}\" from table cell: {active_cell}"
    return "Click the table"
app.run_server(debug=True)

Using Dash to upload a PDF and display pandas dataframe extracted from it

Answers (1)

Related Questions