Reputation: 1237
I have a flask app where users can tag their posts with keywords. These tags are separated by a space and are stored in the below model in the tags
column.
class Post(db.Model):
id = db.Column(db.Integer, primary_key=True)
title = db.Column(db.String(100), unique=False, nullable=False)
date_posted = db.Column(db.DateTime, nullable=False, default=datetime.utcnow)
content = db.Column(db.Text, unique=True, nullable=False)
urls = db.Column(db.String(1000), unique=False, nullable=True)
user_id = db.Column(db.Integer, db.ForeignKey('user.id'))
tags = db.Column(db.String(400), unique=False, nullable=True)
upvotes = db.Column(db.Integer, unique=False, nullable=False, default=0)
def __repr__(self):
return f"Post ('{self.title}', '{self.date_posted}')"
I would like to get a table that I can publish to an html page that lists out all of the unique tags and the number of times they occur (sorted by occurrences, descending).
Attempt So, in order to this, I figured I need to 1). get all the tags; 2). loop through them to split by the space (tag.split(' ')); 3). append them back to a dataframe; 4). loop through to count the occurrences.
This does not seem like a very efficient way to do it. But I have tried it nonetheless and this is what I have.
Get all tags
all_tags = Post.query.with_entities(Post.tags)
Get all unique tags
unique_tags = Post.query.with_entities(Post.tags).distinct()
Get tag counts (does not work)
counts = pd.Series(Post.query.with_entities(Post.tags)).value_counts()
but this counts the entire tag string, not the individual tags that are separated by ' '. I feel like this should be straightforward but I cant quite work it out. How can I do this?
Here is some dummy data to play with:
df = pd.DataFrame({'Index': {0: 0, 1: 1, 2: 2, 3: 3},
'tags': {0: 'tag1 tag2',
1: 'tag3 tag4',
2: 'tag5 tag6 tag7',
3: 'tag2 tag3 tag4 tag1'}})
Where the desired result is something like...
{'tag1': {0: 'tag2', 1: 'tag3', 2: 'tag4', 3: 'tag5', 4: 'tag6', 5: 'tag7'},
'2': {0: 2, 1: 2, 2: 2, 3: 1, 4: 1, 5: 1}}
Upvotes: 0
Views: 232
Reputation: 25259
Is this what you want:
pd.Series(df.tags.str.split(' ').sum()).value_counts()
Out[659]:
tag1 2
tag4 2
tag3 2
tag2 2
tag6 1
tag7 1
tag5 1
dtype: int64
Upvotes: 1