Reputation: 960
(I want to match the whole line, the purpose is this, in python, I will list all the files in a directory, then I want to pick those file urls based on certain keywords, ie 'qwert2asdf' and 'windows'):
My current regex:
[a-zA-Z0-9_.\-\\]*(qwert2asdf)[a-zA-Z0-9_.\-\\]*(windows)[a-zA-Z0-9_.\-\\]*
matches line #4 which is what I need
4\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_qwert2asdf_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815.windows.tar.gz
question 1.is there a better way so I don't have to repeat [a-zA-Z0-9_.-\]*
question 2. how do I make the match so that it ignores the order of 'qwert2asdf' and 'windows', that is if 'windows' happen before 'qwert2asdf' and it'll still match?
1\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\css_boxt_pkg_isys.abcdefg_urururur_20140701_1815.linux.tar.gz
2\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\bbb_pkg_all_systems.abcdefg_urururur_20140701_1815.tar.gz
3\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_qwert2asdf_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815.linux.tar.gz
4\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_qwert2asdf_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815.windows.tar.gz
5\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_qwert2asdf_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815_vp.tar.gz
6\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815.linux.tar.gz
7\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815.windows.tar.gz
8\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815_vp.tar.gz
9\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_evih_iii_ass_system.abcdefg_urururur_20140701_1815.tar.gz
10\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\doc_pkg_evih_iii_ass_system.abcdefg_urururur_20140701_1815.tar.gz
11\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_evih_iii_ass_2400_system.abcdefg_urururur_20140701_1815.linux.tar.gz
12\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_evih_iii_ass_2400_system.abcdefg_urururur_20140701_1815.windows.tar.gz
13\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_evih_iii_ass_2400_system.abcdefg_urururur_20140701_1815_vp.tar.gz
14\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\css_pkg_css_skm_cgdsg0_system.abcdefg_urururur_20140701_1815.tar.gz
15\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\css_pkg_css_skm_asdfgt_system.abcdefg_urururur_20140701_1815.tar.gz
16\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_boxtppc_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815.linux.tar.gz
17\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_boxtppc_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815.windows.tar.gz
18\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_boxtppc_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815_vp.tar.gz
19\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ia_css_2.1.3.0.abcdefg_urururur_20140701_1815.linux.tar.gz
20\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ia_css_2.1.3.0.abcdefg_urururur_20140701_1815.windows.tar.gz
21\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\doc_pkg_ia_css_2.1.3.0.abcdefg_urururur_20140701_1815.tar.gz
Upvotes: 0
Views: 106
Reputation: 70732
You can use Positive Lookahead here.
^(?=.*qwert2asdf)(?=.*windows)[\w\\.-]*$
Explanation:
^ # the beginning of the string
(?= # look ahead to see if there is:
.* # any character except \n (0 or more times)
qwert2asdf # 'qwert2asdf'
) # end of look-ahead
(?= # look ahead to see if there is:
.* # any character except \n (0 or more times)
windows # 'windows'
) # end of look-ahead
[\w\\.-]* # any character of: word characters (a-z, A-Z, 0-9, _),
# '\\', '.', '-' (0 or more times)
$ # before an optional \n, and the end of the string
Upvotes: 2
Reputation: 180201
You did not say which regex flavor you are using (POSIX, Perl, Java, ...), but I am unaware of any that has a way to write a pattern that matches the same set of strings as yours without repeating the character class as yours does.
You might be tempted to look at back references, but they do not do what you want.
Depending on the host language, however, you might be able to reduce duplication by putting the text of the character class into a variable, and interpolating the variable into your regular expression at each of the three points.
Matching regardless of the order of the 'qwert2asdf' and 'windows' substrings is messier, but it can be done. Here's one way that should work in pretty much any regex engine, modulo any metacharacter (non-)escaping that might need to be performed:
[a-zA-Z0-9_.\-\\]*((qwert2asdf)[a-zA-Z0-9_.\-\\]*(windows)|(windows)[a-zA-Z0-9_.\-\\]*(qwert2asdf))[a-zA-Z0-9_.\-\\]*
A regex engine that supports zero-width lookbehind assertions would provide other alternatives, but I don't think any would come out shorter.
Upvotes: 0