Reputation: 1
So the data that I need to work with comes as a set of 10 .csv files each with names of the following format:
Example_datatype_date_IDnumber.csv
Each of the 10 files requires different manipulation/analysis and I'd like to do it all with one python script. I can do it successfully with pandas but the issue is that every time I get a new set I have to go in and manually change the date and ID number in the filename when I import the file. Is there a way to import the files and ignore the date and ID number (differentiate only based on datatype)? I would just create a new folder/directory for each set of 10.
Upvotes: 0
Views: 1751
Reputation: 893
import os, re
path_containing_csv_files = '/tmp/test' #contains < example_int_12-12-16_1.csv, example_string_11-12-16_2.csv>
def process_int(filepath):
#process int data here
pass
def process_string(filepath):
#process string data here
pass
methods = {'int':process_int,
'string':process_string}
for file_name in os.listdir(path_containing_csv_files):
parsed = re.search('[^_]+_([^_]+).*\csv', file_name))
if parsed:
methods[parsed.group(1)](os.path.join(path_containing_csv_files, file_name))
Upvotes: 0
Reputation: 6121
If you put all files in one folder (assume c:\tmp), you could use regex and glob
to find all files:
import glob
path = r"c:\\tmp\\*.csv"
for filePath in glob.glob(path):
# read file and analysis file
or
import re
import os
pattern = r'\w+_\w+_\w+_\w+\.csv'
for i in os.listdir("c:\\tmp\\"):
if re.search(pattern,i):
# read file and analysis file
Upvotes: 1
Reputation: 2269
You can use regular expressions to detect the datatype from the file name:
import os, re
files = os.listdir("my_directory")
for fname in files:
m = re.search('[^_]+_([^_]+).*\csv', fname)
if m:
datatype = m.group(1)
print fname
print datatype
Upvotes: 0