user97662
user97662

Reputation: 960

regex matching whole with a few criteria

(I want to match the whole line, the purpose is this, in python, I will list all the files in a directory, then I want to pick those file urls based on certain keywords, ie 'qwert2asdf' and 'windows'):

My current regex:

[a-zA-Z0-9_.\-\\]*(qwert2asdf)[a-zA-Z0-9_.\-\\]*(windows)[a-zA-Z0-9_.\-\\]*

matches line #4 which is what I need

4\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_qwert2asdf_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815.windows.tar.gz

question 1.is there a better way so I don't have to repeat [a-zA-Z0-9_.-\]*

question 2. how do I make the match so that it ignores the order of 'qwert2asdf' and 'windows', that is if 'windows' happen before 'qwert2asdf' and it'll still match?

1\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\css_boxt_pkg_isys.abcdefg_urururur_20140701_1815.linux.tar.gz
2\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\bbb_pkg_all_systems.abcdefg_urururur_20140701_1815.tar.gz
3\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_qwert2asdf_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815.linux.tar.gz
4\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_qwert2asdf_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815.windows.tar.gz
5\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_qwert2asdf_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815_vp.tar.gz
6\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815.linux.tar.gz
7\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815.windows.tar.gz
8\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815_vp.tar.gz
9\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_evih_iii_ass_system.abcdefg_urururur_20140701_1815.tar.gz
10\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\doc_pkg_evih_iii_ass_system.abcdefg_urururur_20140701_1815.tar.gz
11\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_evih_iii_ass_2400_system.abcdefg_urururur_20140701_1815.linux.tar.gz
12\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_evih_iii_ass_2400_system.abcdefg_urururur_20140701_1815.windows.tar.gz
13\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_evih_iii_ass_2400_system.abcdefg_urururur_20140701_1815_vp.tar.gz
14\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\css_pkg_css_skm_cgdsg0_system.abcdefg_urururur_20140701_1815.tar.gz
15\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\css_pkg_css_skm_asdfgt_system.abcdefg_urururur_20140701_1815.tar.gz
16\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_boxtppc_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815.linux.tar.gz
17\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_boxtppc_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815.windows.tar.gz
18\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ass_bss_sw_boxtppc_evih_iii_ass_2401_system.abcdefg_urururur_20140701_1815_vp.tar.gz
19\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ia_css_2.1.3.0.abcdefg_urururur_20140701_1815.linux.tar.gz
20\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\ia_css_2.1.3.0.abcdefg_urururur_20140701_1815.windows.tar.gz
21\\abc123-smb.ccabc.com\nfs\site\disks\.abcdefghigk.1234\abcdfff\day.abcdefg_urururur_20140701_1815\nnn-pppp\doc_pkg_ia_css_2.1.3.0.abcdefg_urururur_20140701_1815.tar.gz

Upvotes: 0

Views: 106

Answers (3)

hwnd
hwnd

Reputation: 70732

You can use Positive Lookahead here.

^(?=.*qwert2asdf)(?=.*windows)[\w\\.-]*$

Explanation:

^                # the beginning of the string
(?=              # look ahead to see if there is:
  .*             #   any character except \n (0 or more times)
  qwert2asdf     #   'qwert2asdf'
)                # end of look-ahead
(?=              # look ahead to see if there is:
  .*             #   any character except \n (0 or more times)
  windows        #   'windows'
)                # end of look-ahead
[\w\\.-]*        # any character of: word characters (a-z, A-Z, 0-9, _), 
                 #  '\\', '.', '-' (0 or more times)
$                # before an optional \n, and the end of the string

Live Demo

Upvotes: 2

John Bollinger
John Bollinger

Reputation: 180201

You did not say which regex flavor you are using (POSIX, Perl, Java, ...), but I am unaware of any that has a way to write a pattern that matches the same set of strings as yours without repeating the character class as yours does.

You might be tempted to look at back references, but they do not do what you want.

Depending on the host language, however, you might be able to reduce duplication by putting the text of the character class into a variable, and interpolating the variable into your regular expression at each of the three points.

Matching regardless of the order of the 'qwert2asdf' and 'windows' substrings is messier, but it can be done. Here's one way that should work in pretty much any regex engine, modulo any metacharacter (non-)escaping that might need to be performed:

[a-zA-Z0-9_.\-\\]*((qwert2asdf)[a-zA-Z0-9_.\-\\]*(windows)|(windows)[a-zA-Z0-9_.\-\\]*(qwert2asdf))[a-zA-Z0-9_.\-\\]*

A regex engine that supports zero-width lookbehind assertions would provide other alternatives, but I don't think any would come out shorter.

Upvotes: 0

coolerfarmer
coolerfarmer

Reputation: 385

This should work:

(.*?)windows(.*)

Upvotes: 0

Related Questions