thinkingmonster
thinkingmonster

Reputation: 5413

Check md5sum of all files in a directory in python

I am having too many duplicate songs on my hard-disk and i want to remove duplicates.What i want to do is write a python program to check md5sum of all files and remove file with similar md5sum but keeping at least 1 copy.The issue i am facing is that music files have spaces in their name and path is not evaluated correctly.Below is my code to check md5sum

__author__ = 'akthakur'

import os
import subprocess

def listFiles(location):
    for name in os.listdir(location):
        path=os.path.join(location,name)
        if os.path.isdir(path):
         listFiles(path)
        else:
          cmd= 'md5sum ' + path
          print(cmd)
          fp= os.popen(cmd)
          print(fp.read())
          fp.close()

listFiles("E:\offwork")

I want to just pass path of my e drive to "listFiles" function and it should provide me md5sum off all the files in all directories recursively.

Spaces in file names are causing serious trouble,is there any way of dealing with the same

Upvotes: 0

Views: 227

Answers (2)

Cu3PO42
Cu3PO42

Reputation: 1473

You call md5sum as an external program. That means a file path needs to be escaped with quotes if it contains a space. Please change

cmd= 'md5sum ' + path

to

cmd = 'md5sum "' + path + '"'

Upvotes: 1

Gecko
Gecko

Reputation: 1408

You can pass a list of args in popen

https://docs.python.org/2/library/subprocess.html

args = path.split(' ')

fp= os.popen(['md5sum']+args)

Upvotes: 0

Related Questions