devaerial
devaerial

Reputation: 2199

os.walk() visits same folder twice

I'm writing simple script using mutagen library which counts amount of audio files in folder and entire audio play time of folder (including audio files from subfolders).

import os,sys
from datetime import datetime,timedelta
from mutagen.mp3 import MP3
from mutagen.flac import FLAC
from mutagen.aac import AAC
from mutagen.aiff import AIFF
from mutagen.asf import ASF

audio_ext={"mp3":lambda x: MP3(x).info.length,
           "aac":lambda x: AAC(x).info.length,
           "wmv":lambda x: ASF(x).info.length,
           "wma":lambda x: ASF(x).info.length,
           "asf":lambda x: ASF(x).info.length,
           "flac":lambda x: FLAC(x).info.length,
           "aiff":lambda x: AIFF(x).info.length,}

def scan_lib(path):
    playtime = 0
    audio_files = 0
    for root,dirs,files in os.walk(path,followlinks=False):
        for f in files:
           try:
               playtime += audio_ext[f[len(f)-f[::-1].index('.'):]](os.path.join(root,f))
               audio_files += 1
           except (KeyError,ValueError):
               pass

        for d in dirs:
            dir_playtime,dir_audios = scan_lib(os.path.join(root,d))
            playtime +=dir_playtime
            audio_files += dir_audios

    print("\nLibrary:",path)
    print("Amount of audio files:",audio_files)
    print("Total playing time:\nDays\tHours\tMin\tSec\n%d\t%d\t%d\t%d\n" % convert_pt(playtime))
    return playtime,audio_files

def convert_pt(sec):
    t = datetime(1,1,1) + timedelta(seconds=int(sec))
    return t.day-1, t.hour,t.minute,t.second

main_path = sys.argv[1]
playtime,audio_files = scan_lib(main_path)

After some tests I figured out that my script visits some folders twice. Usually those directories are subfolders in another subfolders. As the result it prints this kind of result:

$ python3 music_scan.py 

Library: ~/Music/
Amount of audio files: 3520
Total playing time:
Days    Hours   Min Sec
9   7   30  26

But in reality if you move all audio tracks into one folder and run script on that test folder it shows different result:

$ python3 music_scan.py ~/test
Library: ~/test/
Amount of audio files: 885
Total playing time:
Days    Hours   Min Sec
2   15  49  9

Indeed amount of audio tracks in test folder was 885 . I checked it with ls | wc -l command So why os.walk() visit some subfolders twice?

Upvotes: 0

Views: 1555

Answers (1)

phihag
phihag

Reputation: 287855

os.walk already recursively walks the entire directory tree.

You, however, recursively call your method scan_lib:

def scan_lib(path):
    ...
    for root,dirs,files in os.walk(path,followlinks=False):
        ...
        for d in dirs:
            dir_playtime,dir_audios = scan_lib(os.path.join(root,d))
            ...

Either use os.listdir instead of os.walk and keep the recursive calls, or simply remove the 4 lines starting with for d in dirs:.

Upvotes: 4

Related Questions