thelaw
thelaw

Reputation: 588

API Get method to get all tweets with hashtag count greater than within MongoDB in JSON format

I have a MongoDB database that contains a number of tweets. I want to be able to get all the tweets in JSON list through my API that contain a number of hashtags greather than that specified by the user in the url (eg http://localhost:5000/tweets?morethan=5, which is 5 in this case) .

The hashtags are contained inside the entities column in the database, along with other columns such as user_mentions, urls, symbols and media. Here is the code I've written so far but doesnt return anything.

#!flask/bin/python

app = Flask(__name__)

@app.route('/tweets', methods=['GET'])
def get_tweets():
# Connect to database and pull back collections

db = client['mongo']
collection = db['collection']

parameter = request.args.get('morethan')

if parameter:
    gt_parameter = int(parameter) + 1  # question said greater than not greater or equal
    key_im_looking_for = "entities.hashtags.{}".format(gt_parameter)  # create the namespace#
    cursor = collection.find({key_im_looking_for: {"$exists": True}})

EDIT: IT WORKS!

Upvotes: 0

Views: 355

Answers (1)

bauman.space
bauman.space

Reputation: 2023

The code in question is this line

cursor = collection.find({"entities": {"hashtags": parameter}})

This answer explains why it is impossible to directly perform what you ask.

mongodb query: $size with $gt returns always 0

That answer also describes potential (but poor) ideas to get around it.

The best suggestion is to modify all your documents and put a "num_hashtags" key in somewhere, index that, and query against it.

Using The Twitter JSON API you could update all your documents and put a the num_hashtags key in the entities document.

Alternatively, you could solve your immediate problem by doing a very slow full table scan across all documents for every query checking if the hashtag number which is one greater than your parameter exists by abusing MongoDB Dot Notation.

gt_parameter = int(parameter) + 1  # question said greater than not greater or equal
key_im_looking_for = "entities.hashtags.{}".format(gt_parameter)  #create the namespace# 
# py2.7 => key_im_looking_for = "entities.hashtags.%s" %(gt_parameter) 
# in this example it would be "entities.hashtags.6"
cursor = collection.find({key_im_looking_for: {"$exists": True}})

The best answer (and the key reason to use a NoSQL database in the first place) is that you should modify your data to suit your retrieval. If possible, you should perform an inplace update adding the num_hashtags key.

Upvotes: 1

Related Questions