Reputation: 213
I have following code which will store all the csv filename in a list from a specific folder
import pandas as pd
import re
import os
files = os.listdir('.')
filename=[filename for filename in files if filename.endswith('.csv')]
However, in my folder, I have two types of csv files, one ends with, for example, _20.cvs(or maybe _18.csv,_01.csv), another one ends with _Raw.csv;
However I only need the first type stored in my list. I know regular expression may can help me on that, so I did some google search, and come up with the following code, but it seems doesn't work, can anyone offer a advice?
filename = [re.search(r'^\d{2}.csv'),filename).group(0) for filename in files]
Upvotes: 3
Views: 7680
Reputation: 626738
You need to remove ^
(as it matches the start of string location), add $
at the end of the pattern (to make sure the match is at the end of the string) and escape the dot (else, .
matches any char but a line break char).
Note you must check if there is a match before accessing .group()
:
result = [f for f in files if re.search(r'_\d{2}\.csv$', f)]
Details
_
- an underscore\d{2}
- 2 digits \.
- a literal dotcsv
- csv
text$
- end of string.See the regex demo.
import re
files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]
result = [f for f in files if re.search(r'_\d{2}\.csv$', f)]
print(result)
# => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']
Upvotes: 7
Reputation: 309
re.match would not work because it matches at the beginning. Use re.search instead. But everything else is fine in the previous solution.
import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.search(r'(_\d+.csv)', f)]
print(filenames)
Upvotes: 3
Reputation: 270980
You should put the regex operation in the if
clause so as to filter out those you don't want.
You should also escape the .
in the regex, since dots have special meaning in regex (match all non-line terminators).
[filename for filename in files if re.search(r'\d{2}\.csv$', filename)]
If you want only the matched bit, you can do a simple substring:
[filename[-6:] for filename in files if re.search(r'\d{2}\.csv$', filename)]
Upvotes: 1
Reputation: 366
Try to use re.match
method:
import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.match(r'(_\d+.csv)', f)]
print(filenames)
Upvotes: 1