user4104265
user4104265

Reputation: 47

Join lines depending on the line beginning

I have a file that, occasionally, has split lines. The split is signaled by the fact that the line starts with a space, empty line or a nonnumeric character. E.g.

40403813|7|Failed|No such file or directory|1
40403816|7|Hi,
 The Conversion System could not be reached.|No such file or directory||1
40403818|7|Failed|No such file or directory|1
...

I'd like join the split line back with the previous line (as mentioned below):

40403813|7|Failed|No such file or directory|1
40403816|7|Hi, The Conversion System could not be reached.|No such file or directory||1
40403818|7|Failed|No such file or directory|1
...

using a Unix command like sed/awk. I'm not clear how to join a line with the preceeding one.

Any suggestion?

Upvotes: 1

Views: 1662

Answers (6)

Walter A
Walter A

Reputation: 19982

You can use sed when you have a file without \r :

tr "\n" "\r" < inputfile | sed 's/\r\([^0-9]\)/\1/g' | tr '\r' '\n'

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203229

Don't do anything based on the values of the strings in your fields as that could go wrong. You COULD get a wrapping line that starts with a digit, for example. Instead just print after every complete record of 5 fields:

$ awk -F'|' '{rec=rec $0; nf+=NF} nf>=5{print rec; nf=0; rec=""}' file
40403813|7|Failed|No such file or directory|1
40403816|7|Hi, The Conversion System could not be reached.|No such file or directory||1
40403818|7|Failed|No such file or directory|1

Upvotes: 3

William Pursell
William Pursell

Reputation: 212228

If the continuation lines always begin with a single space:

perl -0000 -lape 's/\n / /g' input

If the continuation lines can begin with an arbitrary amount of whitespace:

perl -0000 -lape 's/\n(\s+)/$1/g' input

It is probably more idiomatic to write:

perl -0777 -ape 's/\n / /g' input

Upvotes: 2

Benjamin W.
Benjamin W.

Reputation: 52112

If you only ever have the line split into two, you can use this sed command:

sed 'N;s/\n\([^[:digit:]]\)/\1/;P;D' infile

This appends the next line to the pattern space, checks if the linebreak is followed by something other than a digit, and if so, removes the linebreak, prints the pattern space up to the first linebreak, then deletes the printed part.

If a single line can be broken across more than two lines, we have to loop over the substitution:

sed ':a;N;s/\n\([^[:digit:]]\)/\1/;ta;P;D' infile

This branches from ta to :a if a substitution took place.

To use with Mac OS sed, the label and branching command must be separate from the rest of the command:

sed -e ':a' -e 'N;s/\n\([^[:digit:]]\)/\1/;ta' -e 'P;D' infile

Upvotes: 2

RavinderSingh13
RavinderSingh13

Reputation: 133458

Try:

    awk 'NF{printf("%s",$0 ~ /^[0-9]/ && NR>1?RS $0:$0)} END{print ""}'   Input_file
OR
    awk 'NF{printf("%s",/^[0-9]/ && NR>1?RS $0:$0)} END{print ""}'  Input_file

It will check if each line starts from a digit or not if yes and greater than line number 1 than it will insert a new line with-it else it will simply print it, also it will print a new line after reading the whole file, if we not mention it, it is not going to insert that at end of the file reading.

Upvotes: 2

karakfa
karakfa

Reputation: 67467

awk to the rescue!

awk -v ORS='' 'NR>1 && /^[0-9]/{print "\n"} NF' file

only print newline when the current line starts with a digit, otherwise append rows (perhaps you may want to add a space to ORS if the line break didn't preserve the space).

Upvotes: 4

Related Questions