Sai Kumar
Sai Kumar

Reputation: 715

Regex to match strings in a list without .csv extension

How can i write a regular expression to only match the string names without .csv extension. This should be the required output

Required Output:
 ['ap_2010', 'class_size', 'demographics', 'graduation','hs_directory', 'sat_results']

Input:
data_files = [
"ap_2010.csv",
"class_size.csv",
"demographics.csv",
"graduation.csv",
"hs_directory.csv",
"sat_results.csv"]

I tried but it return a empty list.

for i in data_files:
    regex = re.findall(r'/w+/_[/d{4}][/w*]?', i)

Upvotes: 0

Views: 906

Answers (5)

pwxcoo
pwxcoo

Reputation: 3253

If you want regex, the solution is r'(.*)\.csv:

for i in data_files:
    regex = re.findall(r'(.*)\.csv', i)
    print(regex)

enter image description here

Upvotes: 2

U13-Forward
U13-Forward

Reputation: 71580

l = [
"ap_2010.csv",
"class_size.csv",
"demographics.csv",
"graduation.csv",
"hs_directory.csv",
"sat_results.csv"]

print([i.rstrip('.'+i.split('.')[-1]) for i in l])

Upvotes: 1

KaiserKatze
KaiserKatze

Reputation: 1569

# Input
data_files = [ 'ap_2010.csv', 'class_size.csv', 'demographics.csv', 'graduation.csv', 'hs_directory.csv', 'sat_results.csv' ]

import re

pattern = '(?P<filename>[a-z0-9A-Z_]+)\.csv'
prog = re.compile(pattern)

# `map` function yields:
#  - a `List` in Python 2.x
#  - a `Generator` in Python 3.x
result = map(lambda data_file: re.search(prog, data_file).group('filename'), data_files)

Upvotes: 1

Aidan Rosswood
Aidan Rosswood

Reputation: 1212

Split the string at '.' and then take the last element of the split (using index [-1]). If this is 'csv' then it is a csv file.

for i in data_files:
    if i.split('.')[-1].lower() == 'csv':
        # It is a CSV file
    else:
        # Not a CSV

Upvotes: 1

user3483203
user3483203

Reputation: 51155

If you really want to use a regular expression, you can use re.sub to remove the extension if it exists, and if not, leave the string alone:

[re.sub(r'\.csv$', '', i) for i in data_files]

['ap_2010',
 'class_size',
 'demographics',
 'graduation',
 'hs_directory',
 'sat_results']

A better approach in general is using the os module to handle anything to do with filenames:

[os.path.splitext(i)[0] for i in data_files]

['ap_2010',
 'class_size',
 'demographics',
 'graduation',
 'hs_directory',
 'sat_results']

Upvotes: 4

Related Questions