Reputation: 715
How can i write a regular expression to only match the string names without .csv extension. This should be the required output
Required Output:
['ap_2010', 'class_size', 'demographics', 'graduation','hs_directory', 'sat_results']
Input:
data_files = [
"ap_2010.csv",
"class_size.csv",
"demographics.csv",
"graduation.csv",
"hs_directory.csv",
"sat_results.csv"]
I tried but it return a empty list.
for i in data_files:
regex = re.findall(r'/w+/_[/d{4}][/w*]?', i)
Upvotes: 0
Views: 906
Reputation: 3253
If you want regex, the solution is r'(.*)\.csv
:
for i in data_files:
regex = re.findall(r'(.*)\.csv', i)
print(regex)
Upvotes: 2
Reputation: 71580
l = [
"ap_2010.csv",
"class_size.csv",
"demographics.csv",
"graduation.csv",
"hs_directory.csv",
"sat_results.csv"]
print([i.rstrip('.'+i.split('.')[-1]) for i in l])
Upvotes: 1
Reputation: 1569
# Input
data_files = [ 'ap_2010.csv', 'class_size.csv', 'demographics.csv', 'graduation.csv', 'hs_directory.csv', 'sat_results.csv' ]
import re
pattern = '(?P<filename>[a-z0-9A-Z_]+)\.csv'
prog = re.compile(pattern)
# `map` function yields:
# - a `List` in Python 2.x
# - a `Generator` in Python 3.x
result = map(lambda data_file: re.search(prog, data_file).group('filename'), data_files)
Upvotes: 1
Reputation: 1212
Split the string at '.'
and then take the last element of the split (using index [-1]
). If this is 'csv'
then it is a csv file.
for i in data_files:
if i.split('.')[-1].lower() == 'csv':
# It is a CSV file
else:
# Not a CSV
Upvotes: 1
Reputation: 51155
If you really want to use a regular expression, you can use re.sub
to remove the extension if it exists, and if not, leave the string alone:
[re.sub(r'\.csv$', '', i) for i in data_files]
['ap_2010',
'class_size',
'demographics',
'graduation',
'hs_directory',
'sat_results']
A better approach in general is using the os
module to handle anything to do with filenames:
[os.path.splitext(i)[0] for i in data_files]
['ap_2010',
'class_size',
'demographics',
'graduation',
'hs_directory',
'sat_results']
Upvotes: 4