Reputation: 321
I'm trying to extract some zip files and after extractions reading only specific CSV files from this folder, there is one pattern but I couldnt place the code. The pattern is Im looking numeric name file, here is my code.
import glob
import pandas as pd
import zipfile
from os import listdir
from os.path import isfile, join
path = r'C:\Users\han\Desktop\Selenium'
all_files = glob.glob(path + "/*.zip")
li = []
for filename in all_files:
with zipfile.ZipFile(filename, 'r') as zip_ref:
zip_ref.extractall(r'C:\Users\han\Desktop\Selenium')
detail = [f for f in listdir(r'C:\Users\han\Desktop\Selenium\detail') if isfile(join(r'C:\Users\han.37\Desktop\Selenium\detail', f))]
After this point, my detail list is like this
['119218.csv',
'119218_coaching.csv',
'119218_coaching_comment.csv',
'119218_emp_monitor.csv',
'119218_monitor_work_time.csv',
'119218_reponse_text.csv',
'119218_response.csv',
'119219.csv',
]
What I want is that reading only numeric ones which are 119218 and 119219 .csv. and ofc pd.concat
because they are same shaped data tables.
Thanks in advance
Upvotes: 0
Views: 413
Reputation: 18476
From your file list, you can just filter out the fileNames which has all charcters as digit except for extension .csv
, and there are numerous ways to do so, one way is to split each on .csv
and check if all characters in the first item are digit.
files=['119218.csv',
'119218_coaching.csv',
'119218_coaching_comment.csv',
'119218_emp_monitor.csv',
'119218_monitor_work_time.csv',
'119218_reponse_text.csv',
'119218_response.csv',
'119219.csv',
]
files = [eachFile for eachFile in files if all(c.isdigit() for c in eachFile.split('.csv')[0])]
OUTPUT:
['119218.csv', '119219.csv']
Upvotes: 2
Reputation: 83
You just have to modify this one line:
detail = [f for f in listdir(r'C:\Users\han\Desktop\Selenium\detail') if re.match(r"[0-9]*\.csv", f) and isfile(join(r'C:\Users\han.37\Desktop\Selenium\detail', f))]
Don't forget to import re
Upvotes: 1