Cassandra model paging with REST API

Question

I have model as.

# File: models.py
from uuid import uuid4

from cassandra.cqlengine.models import Model
from cassandra.cqlengine import columns


class StudentModel(Model):
    __table_name__ = 'students'
    id = columns.UUID(primary_key=True, default=uuid4)
    name = columns.Text(index=True, required=True)

    def __json__(self):
        return {'id': str(self.id),
                'name': self.name}

I wrote bottle app which serve data from this model.

# File: app.py
from bottle import run
from bottle import Bottle, request, HTTPResponse

from cassandra.cqlengine import connection
from cassandra.cqlengine.management import sync_table

from models import StudentModel

API = Bottle()

# Create Connection
connection.setup(hosts=['192.168.99.100'],
                 default_keyspace='test',
                 protocol_version=3)

# Sync database table to create table in keyspace
sync_table(StudentModel)

@API.get('/students')
def get_all_students():
    all_objs = StudentModel.all()
    return HTTPResponse(
            body={'data': [x.__json__() for x in all_objs]},
            headers={'content-type': 'application/json'},
            status_code=200)

run(host='localhost',
    port=8080,
    app=API,
    server='auto')

This code works fine, and I get api working as.

curl http://localhost:8080/students -i
HTTP/1.1 200 OK
Content-Length: 74
Content-Type: application/json
Date: Tue, 23 Aug 2016 15:55:23 GMT
Server: waitress
Status-Code: 200

{"data": [{"id": "7f6d18ec-bf24-4583-a06b-b9f55a4dc6e8", "name": "test"}, {"id": "7f6d18ec-bf24-4583-a06b-b9f55a4dc6e9", "name": "test1"}]}

Now I want to add pagging, and want to create api which has limit and offset.

I check Paging Large Queries but it has no example with Model.

Then I change my api to:

# File: app.py
...
...
@API.get('/students')
def get_all_students():
    limit = request.query.limit
    offset = request.query.offset

    all_objs = StudentModel.all()
    if limit and offset:
        all_objs = all_objs[int(offset): int(offset+limit)]

    return HTTPResponse(
            body={'data': [x.__json__() for x in all_objs]},
            headers={'content-type': 'application/json'},
            status_code=200)
...
...

And call api as:

curl "http://localhost:8080/students?limit=1&offset=0" -i
HTTP/1.1 200 OK
Content-Length: 74
Content-Type: application/json
Date: Tue, 23 Aug 2016 16:12:00 GMT
Server: waitress
Status-Code: 200

{"data": [{"id": "7f6d18ec-bf24-4583-a06b-b9f55a4dc6e8", "name": "test"}]}

and

curl "http://localhost:8080/students?limit=1&offset=1" -i
HTTP/1.1 200 OK
Content-Length: 75
Content-Type: application/json
Date: Tue, 23 Aug 2016 16:12:06 GMT
Server: waitress
Status-Code: 200

{"data": [{"id": "7f6d18ec-bf24-4583-a06b-b9f55a4dc6e9", "name": "test1"}]}

I get another solution using has_more_pages and start_fetching_next_page()

from bottle import run
from bottle import Bottle, request, HTTPResponse

from cassandra.cqlengine import connection
from cassandra.query import SimpleStatement
from cassandra.cqlengine.management import sync_table

from models import StudentModel

API = Bottle()

# Create Connection
connection.setup(hosts=['192.168.99.100'],
                 default_keyspace='test',
                 protocol_version=3)

# Sync database table to create table in keyspace
sync_table(StudentModel)

@API.get('/students')
def get_all_students():
    limit = request.query.limit
    offset = request.query.offset

    page = int(request.query.page or 0)

    session = connection.get_session()
    session.default_fetch_size = 1

    objs = StudentModel.all()

    result = objs._execute(objs._select_query())

    data  = []
    count = 0
    while (not page or page > count) and result.has_more_pages:
        count += 1
        if page and page > count:
            result.fetch_next_page()
            continue

        data.extend(result.current_rows)
        result.fetch_next_page()
    all_objs = [StudentModel(**student) for student in data]

    return HTTPResponse(
            body={'data': [x.__json__() for x in all_objs]},
            headers={'content-type': 'application/json'},
            status_code=200)

run(host='localhost',
    port=8080,
    app=API,
    debug=True,
    server='auto')

From above 2 solution, which one is correct ?

Alan Boudreault · Accepted Answer

Currently, there is no efficient way to do pagination with CQLEngine. Using QuerySet slicing works, but be aware that previous pages will still be materialized internally in the result cache. So, this can lead to memory issues and it also affect the request performance. I've created a ticket to analyze a way to fill a single page at a time. You can watch the following ticket:

https://datastax-oss.atlassian.net/browse/PYTHON-627

If you need an immediate efficient pagination support, I suggest to use the core driver instead of cqlengine.

Cassandra model paging with REST API

Answers (1)

Related Questions