Klausette
Klausette

Reputation: 412

Ignore linebreaks when searching for patterns with bash

I have files with constant stream of letters, capped at 10 letters per line, like so:

ABCDEFGHIJ
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXABCDEF
ABCDEFGHIJ

I want to remove the Xs in groups of three, so I want the result to be

ABCDEFGHIJ
XABCDEF
ABCDEFGHIJ

My current approach is

sed 's/XXX//g' inputFile > outputFile

but that only considers the pattern within a single line, and results in:

ABCDEFGHIJ
X
X
X
XABCDEF
ABCDEFGHIJ

How do I need to formulate the search pattern to ignore linebreaks, so to essentially accept XXX, X\nXX, and XX\nX? Is this possible with sed, or another command?

Upvotes: 1

Views: 602

Answers (3)

potong
potong

Reputation: 58488

This might work for you (GNU sed):

sed -zE 's/(X|X\n){3}//g' file

or without the -z slurp option:

sed -E 'H;$!d;x;s/^\n|(X|X\n){3}//g' file 

Upvotes: 0

mattb
mattb

Reputation: 3063

This will do it:

paste -sd '' your_file | sed 's/XXX/   /g' | fold -w 10 | sed 's/ //g; /^$/d'
  • paste -sd '' your_file merges all the lines onto a single line
  • sed 's/XXX/ /g' replaces three X's by three spaces (note this will be problematic if the original file has spaces, since in the last step I remove them all... you could choose some other unique replacement if this is the case).
  • fold -w 10 folds the long line back to a set of lines 10 characters long
  • sed 's/ //g; /^$/d' removes the spaces and the removes any blank lines (if you used some other unique replacement instead of spaces in the second step, remove that instead of spaces in this step).

Outputs

ABCDEFGHIJ
XABCDEF
ABCDEFGHIJ

Upvotes: 2

Cyrus
Cyrus

Reputation: 88819

With GNU sed. Modify your regex.

sed -zE 's/X\n{0,1}X\n{0,1}X\n{0,1}//g' inputFile > outputFile

Or shorter:

sed -zE 's/(X\n?){3}//g' inputFile > outputFile

Output to outputFile:

ABCDEFGHIJ
XABCDEF
ABCDEFGHIJ

-z: separate lines by NUL characters

Upvotes: 3

Related Questions