Reputation: 349
I am trying to list down all files with similar name into separate folder but can't figure out there names to move them.
From below i am creating folder with Duplicate name in my Working directory and then passing all files through split function to get middle part of name from xmlName = xml.split('.')[1]
line. Now xmlName
has only part of file name which helps me me to decided duplicate name.
Below is the list of file in working directory:
# ls
CRON.JC_ADA_SOURCE_DLOAD.xml Duplicate TERA.SC_CACHE_PURGE_01.xml
CRON.JC_ADA_SOURCE_WLOAD.xml POWE.BI_RUN_INFO_WKFLW_INF1.xml test.py
CRON.SC_ADA_CLEANUP_SCRIPT.xml POWE.JC_ADA_SOURCE_DLOAD.xml Unknown
CRON.SC_CACHE_PURGE_01.xml POWE.SC_CHECK_ADA_DATA_FILE_INF2.xml
#
Below is the code (Where i am not sure how to list down only duplicate files).
#!/usr/bin/python
import os, sys
Working_Dir = "/home/export/Partition/JobDefinition"
if not os.path.exists('./Duplicate'):
os.makedirs('./Duplicate', 0755)
for path, dir, files in os.walk(Working_Dir):
for xml in files:
xmlName = xml.split('.')[1]
if xmlName == xmlName:
print xmlName
Output:
# python test.py
SC_ADA_CLEANUP_SCRIPT
SC_CHECK_ADA_DATA_FILE_INF2
JC_ADA_SOURCE_WLOAD
BI_RUN_INFO_WKFLW_INF1
JC_ADA_SOURCE_DLOAD
SC_CACHE_PURGE_01
JC_ADA_SOURCE_DLOAD
SC_CACHE_PURGE_01
py
#
What output i need is below names so that i can move respective file to Duplicate folder:
JC_ADA_SOURCE_DLOAD
SC_CACHE_PURGE_01
Upvotes: 1
Views: 195
Reputation: 6587
If you are trying to find the duplicate elements in your list and create an other list for only those duplicate elements this is how you can do it:
Here I have list a
with two duplicate elements in it 2
and 3
. I find those elements in list a
and create an other list b
which will contain only those two elements.
import collections
a = [1,2,3,4,5,6,2,3,]
b = [item for item, count in collections.Counter(a).items() if count > 1]
When you print b
the output is:
[2, 3]
Then, later if you also want to remove the duplicate elements from a
, you can use set
to do it like this:
a = set([1,2,3,4,5,6,2,3,])
Now when you print a
the output is:
set([1, 2, 3, 4, 5, 6])
Upvotes: 0
Reputation: 3334
The Lazy Answer
collections.Counter
will do what you want, by magic.
import collections
c = collections.Counter([])
for path, dir, files in os.walk(Working_Dir):
c += collections.Counter([xml.split('.')[1] for xml in files])
The Somewhat Less Lazy Answer
Keep track of every unique file with set
seen = set()
duplicates = set()
for path, dir, files in os.walk(Working_Dir):
for xml in files:
xmlName = xml.split('.')[1]
if xmlName in seen:
duplicates.add(xmlName)
seen.add(xmlName)
Upvotes: 1
Reputation: 49803
If you only want duplicates, you can store names as you find them in something (set would be most appropriate, but a list will do); if something you are about to put in is already there, it is a duplicate.
Upvotes: 0