Reputation: 79
i am trying to extract a fixed format string from a column, this column contains multiple fixed format string.
e.g.
2018-1234567,2018-0123456,2018-2345678
but it always contain many kinds of garbage strings(casue this column input manually)
e.g.
SN:2018-1234567 and 2018-0123456 and 2018-2345678
or
tracking ID:2018-1234567; 2018-0123456;2018-2345678
i can use RXREPLACE function to elimated some of them but i can not remove some unpredicted garbage string in the future.
so i am wondering, is there a function can just extract fixed format strings from this column? the fixed format string is YYYY-NNNNNNN (year-7digit number)
if i can do this, i needn't worry about the furture unknown garbage string in this column.
thanks a lot!
Upvotes: 1
Views: 242
Reputation: 1492
This might not be a cast-iron solution however it works for the examples, without cryptic look-arounds, if you are happy to have a space as the separator:
Trim(RXReplace([mystring],"\\\D*\\\d*\\\D*(\\\d{4}-\\\d{7})","$1,","g"))
It captures the string you like (\\d{4}-\\d{7}) and replaces it with itself $1 followed by a space. Everything that is not surrounded by () is left behind. It assumes that whatever 'garbage' appears before/between each string is made of a mixture of digits (\\d) and non-digits (\\D).
In fact simply \\D* would have worked, but it would not have removed something like tracking123 ID:
The Trim()
is to remove the last space.
Gaia
Upvotes: 1