ketan
ketan

Reputation: 2904

How to read files in the directory using glob pattern in python?

I want read files in one directory.

Directory contains :

ABC1.csv
ABC1_1.csv
ABC1_2.csv
ABC11.csv
ABC11_1.csv
ABC11_3.csv
ABC11_2.csv
ABC13_4.csv
ABC13_1.csv
ABC17_6.csv
ABC17_2.csv
ABC17_4.csv
ABC17_8.csv

While running script I want to give command line argument for reading specific files depend on some conditions :

  1. If user give only ABC error message.
  2. If user give ABC1 then it must read ABC1.csv, ABC1_1.csv and ABC1_2.csv only.
  3. If user give ABC11 then it must read ABC11.csv,ABC11_1.csv,ABC11_2.csv,ABC11_3.csv only.
  4. If user give ABC13 it must read ABC13_1.csv,ABC13_4.csv only.
  5. If user give ABC17 then it must read ABC17_2.csv,ABC17_4.csv,ABC17_6.csv,ABC17_8.csv only.

For this stuff I'm created a script but I'm facing issue.

Program-

from glob import glob
import os
import sys

file_pattern = ''
files_list = list()
arguments = {'ABC', 'PQR', 'XYZ'}

if len(sys.argv[1:2]) is 1:
   file_pattern = str(sys.argv[1:2])
else:   
   print 'run as <python test.py ABC>'
   sys.exit(1)
if file_pattern in arguments:
   print '<Provide Name with some Number>'
   sys.exit(1)

file_pattern = file_pattern.replace('[','').replace(']','').replace('\'','')

if file_pattern.startswith('ABC',0,3):
   files_list = glob(os.path.join('<directory name>', str(file_pattern)+'_*.csv'))
else:
   print 'No Such File --> ' + str(file_pattern)+ '\t  <Provide appropriate Name>'
   sys.exit(1)

if files_list:
   for a_file in sorted(files_list):
      print a_file
      #process file
else:
   print 'No Such File --> ' + str(file_pattern)+ '\t  <Provide appropriate Name>'
   sys.exit(1)

This code is working fine but it doesn't satisfy my 2nd condition. when user is giving ABC1 as a argument i.e. python test.py ABC1 , it will return files ABC1_1.csv, ABC1_2.csv but not returning ABC1.csv file.

How I can satisfy this 2nd condition also without losing any other condition?

Upvotes: 2

Views: 4205

Answers (3)

Jesper Freesbug
Jesper Freesbug

Reputation: 415

You might want to add a simple check for the additional "special" case, something like this:

if file_pattern.startswith('ABC',0,3):
   csv_path = os.path.join('.', str(file_pattern))
   files_list = glob(csv_path + '_*.csv')
   # Just check the special case that's not included in the glob above
   csv_path = csv_path + '.csv'
   if os.path.isfile(csv_path):
      files_list.append(csv_path)
else:
   print 'No Such File --> ' + str(file_pattern)+ '\t  <Provide appropriate Name>'
   sys.exit(1)

Upvotes: 0

ketan
ketan

Reputation: 2904

I tried with different Scenarios,and finally got exact solution which satisfies all my conditions. First I'm checking for user input file is available or not in the specified directory, If it is available then globing all files with same file with (_) all at the end appending match file to same list.

If user input if not file is not available in the specified directory then I'm checking for the files with (_) symbol then globing all files into list. At the end iterating through list and got final result.

Program-

from glob import glob
import os
import sys

file_pattern = ''
files_list = list()

arguments = {'ABC', 'PQR', 'XYZ'}

#checking for user provided argument or not
if len(sys.argv[1:2]) is 1:
   file_pattern = str(sys.argv[1:2])
else:   
   print 'run as < python test.py <LineName> >'
   sys.exit(1)
#replace all unnecessary stuff with ('')
file_pattern = file_pattern.replace('[','').replace(']','').replace('\'','')

#checking for line number is provided or not
if file_pattern in arguments:
   print '<Provide LineName with some Number>'
   sys.exit(1)

flag = True
#list of all files containing specified directory
files = os.listdir('<directory name>')

for file_name in files:
   if str(file_name) == str(file_pattern)+'.csv':
      files_list = glob(os.path.join('<directory name>', str(file_pattern)+'_*.csv'))
      #appending match file also to resultant list
      files_list.append('<directory name>'+file_name)
      flag = False
#if specified file is not present in dir check for filename with (_)
if flag:
   files_list = glob(os.path.join('<directory name>', str(file_pattern)+'_*.csv'))

#checking for list contains items or not
if files_list:
   for a_file in sorted(files_list):
      print a_file
else:
   print 'No Such File --> ' + str(file_pattern)+ '\t  <Provide appropriate Name1>'
   sys.exit(1)

Consider directory contains ABC1.csv, ABC1_1.csv, ABC1_2.csv, ABC11.csv, ABC11_1.csv, ABC11_3.csv, ABC11_2.csv files.

Output Scenario :

#if input is ABC1
.\\ABC1.csv
.\\ABC1_1.csv
.\\ABC1_2.csv
#if input is ABC11
.\\ABC11.csv
.\\ABC11_1.csv
.\\ABC11_2.csv
.\\ABC11_3.csv

Upvotes: 1

dodell
dodell

Reputation: 490

I have a solution. It's not perfect, depends if you have other files in the folder:

file_pattern = 'ABC1'
files_list = glob(os.path.join('<directory name>', str(file_pattern)+'[!0-9]*'))
# output: ABC1.csv, ABC1_1.csv, ABC1_2.csv

file_pattern = 'ABC11'
files_list = glob(os.path.join('<directory name>', str(file_pattern)+'[!0-9]*'))
# output: ['.\\ABC11.csv', '.\\ABC11_1.csv', '.\\ABC11_2.csv', '.\\ABC11_3.csv']

I had the same problem as Jesper. The issue is that although * will match any character, it needs a character!

By selecting any file that doesn't have a digit after the file pattern, we avoid the 1-11 issue.

Upvotes: 0

Related Questions