Reputation: 123
I have a string where I want to extract the key information from:
gbk_kings_common_20171201_20180131_66000.0k_2017-12-01_TO_2018-01-31_id12_1277904128.csv
Namely, I would like to find the following:
gbk_kings_common_20171201_20180131
330.0k
2017-12-01_TO_2018-02-31
id12_12771231518
But I'm having a difficulty compiling the regex since the file identifier can always change in the length, although the rest of the information is pretty fixed when delimited by commas.
Upvotes: 0
Views: 57
Reputation: 12015
You can use the pattern r'(.*)_(.*)_([\d-]+_TO_[\d-]+)_(id[\d_]*)
to search your string.
>>> import re
>>> s = "gbk_kings_common_20171201_20180131_66000.0k_2017-12-01_TO_2018-01-31_id12_1277904128.csv"
>>> sre = re.search(r'(.*)_(.*)_([\d-]+_TO_[\d-]+)_(id[\d_]*)', s)
>>> file_id, size, date, type_id = sre.groups()
>>> print (file_id, size, date, type_id)
gbk_kings_common_20171201_20180131 66000.0k 2017-12-01_TO_2018-01-31 id12_1277904128
Upvotes: 4