klex52s
klex52s

Reputation: 437

Regular Expression Matching with Carriage Returns in Python

I have the following data and want to match certain strings as commented below.

FTUS80 KWBC 081454 AAA\r\r TAF AMD   #should match 'AAA'
LTUS41 KCTP 082111 RR3\r\r TMLLNS\r  #should match 'RR3' and 'TMLLNS'
SRUS55 KSLC 082010\r\r HM5SLC\r\r    #should match 'HM5SLC'
SRUS55 KSLC 082010\r\r SIGC  \r\r    #should match 'SIGC  ' including whitespace

I need the following conditions met. But it doesn't work when I put it all together so I know I have mistakes. Thanks in advance.

Upvotes: 0

Views: 81

Answers (2)

LMC
LMC

Reputation: 12777

It's not clear what's the end of line here but assuming it's Unix one \n, the following expression captures strings as requested (double quotes added to show white space)

sed -rne 's/^.{18} ?([A-Z0-9]{3,3})?\r{2}?([^\r]+)?\r.*$/"\1\2"/p' text.txt

Result

"AAA"
"RR3 TMLLNS"
" HM5SLC"
" SIGC  "
  • .{18} first 18 characters
  • ?([A-Z0-9]{3,3})? matches AAA or RR3 without leading space
  • \r{2}?([^\r]+)?\r matches TMLLNS, HM5SLC or SIGC preceded by 2 \r and followed by 1 \r characters.

Upvotes: 0

benvc
benvc

Reputation: 15120

There is probably a more elegant way, but you could do something like the following:

(?:\d{6}\s?)([A-Z\d]{3})?(?:[\r\n]{2}\s)([A-Z\d]{6}|[A-Z\d]{4}\s{2})?
  • (?:\d{6}\s?) non capture group of 6 digits followed by an optional space
  • ([A-Z\d]{3})? optional capture group of 3 uppercase letters / digits
  • (?:[\r\n]{2}\s) non capture group of two line endings followed by 1 space
  • ([A-Z\d]{6}|[A-Z\d]{4}\s{2})? optional capture group of either 6 uppercase letters / digits OR 4 uppercase letters / digits followed by 2 spaces

Upvotes: 1

Related Questions