Reputation: 2968
I have a set of filenames like:
PATJVI_RNA_Tumor_8_3_63BJTAAXX.310_BUSTARD-2012-02-19.fq.gz
PATMIF_RNA_Tumor_CGTGAT_2_1_BC0NKBACXX.334_BUSTARD-2012-05-07.fq.gz
I would like to have a single regex (in python, fyi) that can capture each of the groups between the "_" characters. However, note that in the second filename, there is a group that is present that is not present in the first filename. Of course, one can use a string split, etc., but I would like to do this with a single regex. The regex for the first filename is something like:
(\w+)_(\w+)_(\w+)_(\d)_(\d)_(\w+)\.(\d+)_(\S+)\.fq\.gz
And the second will be:
(\w+)_(\w+)_(\w+)_(\w+)_(\d)_(\d)_(\w+)\.(\d+)_(\S+)\.fq\.gz
I'd like the regex group to be empty when the optional group is present and contain the optional group when it is present (so that I can use it later to in constructing a new filename with \4).
Upvotes: 2
Views: 152
Reputation: 12100
To make a group optional, you can add ?
after the desired group. Like this:
(\w+)?
But your example has an underscore that should be optional as well. To deal with it, you can group it together with optional group.
((\w+)_)?
However this will add a new group to your match results. To avoid it, use a non-matching group:
(?:(\w+)_)?
The final result will look like this:
(\w+)_(\w+)_(\w+)_(?:(\w+)_)?(\d)_(\d)_(\w+)\.(\d+)_(\S+)\.fq\.gz
Upvotes: 7