Werner Raath
Werner Raath

Reputation: 1512

Python asyncio task list progress logging

I need to process a massive amount of data rows and it only makes sense to do this asynchronously.

I need to see list processing status, i.e. Done processing 1/3, but when I increment the counter, it always stays at 1. This makes sense since I send the counter into the function. I needed to do this because without it, I would get:

UnboundLocalError: local variable 'processed' referenced before assignment

Using Python 3.8

Any help would be appreciated!

Here's a link to test: https://ideone.com/gRjrf2

I've abstracted my code below:

    import os, logging
    import asyncio
     
    logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)-8s [%(filename)s:%(lineno)d] %(message)s', datefmt='%d-%b-%y %H:%M:%S')
    logger = logging.getLogger(__name__)
     
    items = [{"name": "A"}, {"name": "B"}, {"name": "C"}]
     
    processed = 0
     
    async def increment(item):
        count = item.get('count', 0)
        count += 1
        return count
     
    async def get_and_update(item, processed):
        item['count'] = await increment(item)
        # Show progress now, but how?
        processed += 1
        logger.info(f"You can't see me {processed}")
     
    async def run():
        logger.info(f"Processing {len(items)} items...")
        await asyncio.gather(*[
            asyncio.create_task(
                get_and_update(item, processed)
            ) for item in items
        ])
     
    loop = asyncio.get_event_loop()
    loop.run_until_complete(run())

The output I get is:

28-Aug-20 11:19:22 INFO     [prog.py:23] Processing 3 items...
28-Aug-20 11:19:22 INFO     [prog.py:20] You can't see me 1
28-Aug-20 11:19:22 INFO     [prog.py:20] You can't see me 1
28-Aug-20 11:19:22 INFO     [prog.py:20] You can't see me 1

Upvotes: 0

Views: 2648

Answers (1)

larsks
larsks

Reputation: 312788

Your basic problem is that by declaring processed as a parameter to get_and_update, you're shadowing the global processed variable. You need to drop the parameter and then declare processed as global within that function, like this:

import os, logging
import asyncio
 
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)-8s [%(filename)s:%(lineno)d] %(message)s', datefmt='%d-%b-%y %H:%M:%S')
logger = logging.getLogger(__name__)
 
items = [{"name": "A"}, {"name": "B"}, {"name": "C"}]
 
processed = 0
 
async def increment(item):
    count = item.get('count', 0)
    count += 1
    return count
 
async def get_and_update(item):
    global processed

    item['count'] = await increment(item)
    # Show progress now, but how?
    processed += 1
    logger.info(f"You can't see me {processed}")
 
async def run():
    logger.info(f"Processing {len(items)} items...")
    await asyncio.gather(*[
        asyncio.create_task(
            get_and_update(item)
        ) for item in items
    ])
 
loop = asyncio.get_event_loop()
loop.run_until_complete(run())

The output of the above is:

28-Aug-20 08:15:00 INFO     [asynctest.py:25] Processing 3 items...
28-Aug-20 08:15:00 INFO     [asynctest.py:22] You can't see me 1
28-Aug-20 08:15:00 INFO     [asynctest.py:22] You can't see me 2
28-Aug-20 08:15:00 INFO     [asynctest.py:22] You can't see me 3

Upvotes: 2

Related Questions