Reputation: 294
This is not typical code problem, instead, it's a design problem I'm facing right now.
Let's say I do have a webpage (which is not mine), and I'd like to scrape a few pieces of information. Most important information, for me, would be when (datetime) character logged in, and when he logged off, but I collect other information as well. Login is known from point 2(see below), but logout i have to calculate I can access 2 pages:
I make only 2 "operations" in tasks.py which are:
So, how it works now is (each minute, thanks to Celery, I do this):
And the problem is, that I'm unsure if that's good idea. My models.py: (Comments are just to clarify what I'm doing)
class Guild:
name = models.CharField(max_length=100)
class Player(models.Model):
#FK:
guild = models.CharField(max_length=50, null=True, blank=True) # Does he have guild?
name = models.CharField(max_length=100, unique=True)
sex = models.CharField(choices=SEX_CHOICES, max_length=7) # Male / Female
level = models.PositiveSmallIntegerField()
vocation = models.CharField(choices=VOCATION_CHOICES, max_length=50) # His class
status = models.CharField(choices=ONLINE_CHOICES, max_length=10) # Offline / Online
lastlogin = models.DateTimeField()
def __str__(self):
return self.name
class Deaths(models.Model):
text = models.CharField(max_length=500)
killed = models.ForeignKey(Player, null=True, on_delete=models.CASCADE, related_name='killed') # Who got killed
killer = models.ForeignKey(Player, null=True, on_delete=models.CASCADE, related_name='killer') # Who killed him
date = models.DateTimeField() # When he died?
level = models.PositiveSmallIntegerField() # On which level player died
pvp = models.BooleanField() # Death was due to PvP or PvE?
class Meta:
ordering = ('date',)
class OnlineDetails(models.Model):
player = models.ForeignKey(Player, on_delete=models.CASCADE)
login = models.DateTimeField() # When he logged in
logout = models.DateTimeField(null=True, blank=True) # When he logged off
def __str__(self):
return self.player.name + " " + str(self.logout) if self.logout else self.player.name
class Meta:
ordering = ('logout', 'login')
it works, but I was wondering if it's best way to do so. Actually, I think that this way it's bad, because I have to scan over ~500 characters in one minute which makes it hard with "antyddos" shield.
Do you have any better solution or technology I should pick up? I'm not best in python nor django, still learning.
Upvotes: 1
Views: 212
Reputation: 2483
Sure you can measure the whole process, how long it takes and so on but I think updating ~500 entries takes few millisecs. Bigger problem could be the scraping of 500 entries every minute, that means you have to send them about ~8 requests per second (based on point 2. not point 1.). I think you are scraping point 1. every minute and on change you scrape the missing characters. Point 1 is notp roblem at all. Parsing so many pages could be hard but not impossible. Also I suggest you to download the pages and store them for some period of time if anything fails during the process also it's faster to download the pages and in other thread parse them parallely, because the most time difficult is sending the request and downloading the response. To the transaction autocommit... it could be problem in multithreaded envinment. You should try measuring the process with and without it if it is worth the risk of not knowing what is happening.
Upvotes: 1