Rowling
Rowling

Reputation: 213

Using regex with list comprehension in python

I have following code which will store all the csv filename in a list from a specific folder

import pandas as pd
import re
import os

files = os.listdir('.')
filename=[filename for filename in files if filename.endswith('.csv')]

However, in my folder, I have two types of csv files, one ends with, for example, _20.cvs(or maybe _18.csv,_01.csv), another one ends with _Raw.csv;

However I only need the first type stored in my list. I know regular expression may can help me on that, so I did some google search, and come up with the following code, but it seems doesn't work, can anyone offer a advice?

filename = [re.search(r'^\d{2}.csv'),filename).group(0) for filename in files] 

Upvotes: 3

Views: 7680

Answers (4)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

You need to remove ^ (as it matches the start of string location), add $ at the end of the pattern (to make sure the match is at the end of the string) and escape the dot (else, . matches any char but a line break char).

Note you must check if there is a match before accessing .group():

result = [f for f in files if re.search(r'_\d{2}\.csv$', f)] 

Details

  • _ - an underscore
  • \d{2} - 2 digits
  • \. - a literal dot
  • csv - csv text
  • $ - end of string.

See the regex demo.

Python demo:

import re
files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]
result = [f for f in files if re.search(r'_\d{2}\.csv$', f)] 
print(result)
# => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']

Upvotes: 7

AResem
AResem

Reputation: 309

re.match would not work because it matches at the beginning. Use re.search instead. But everything else is fine in the previous solution.

import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.search(r'(_\d+.csv)', f)]
print(filenames)

Upvotes: 3

Sweeper
Sweeper

Reputation: 270980

You should put the regex operation in the if clause so as to filter out those you don't want.

You should also escape the . in the regex, since dots have special meaning in regex (match all non-line terminators).

[filename for filename in files if re.search(r'\d{2}\.csv$', filename)]

If you want only the matched bit, you can do a simple substring:

[filename[-6:] for filename in files if re.search(r'\d{2}\.csv$', filename)]

Upvotes: 1

Rezvanov Maxim
Rezvanov Maxim

Reputation: 366

Try to use re.match method:

import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.match(r'(_\d+.csv)', f)]
print(filenames)

Upvotes: 1

Related Questions