Marcelo
Marcelo

Reputation: 33

sed: joining lines depending on the fourth one

I have a file that, occasionally, has split lines.

The split is signaled by the fact that two consecutive lines with Alphabetic characters.


5

00:00:00,000 --> 00:00:00,000

Alphabetic characters
Alphabetic characters

6

00:00:00,000 --> 00:00:00,000

Alphabetic characters

7

00:00:00,000 --> 00:00:00,000

Alphabetic characters
Alphabetic characters

8

00:00:00,000 --> 00:00:00,000

Alphabetic characters

.....

I'd like join the split line back:


5

00:00:00,000 --> 00:00:00,000

Alphabetic characters Alphabetic characters

6

00:00:00,000 --> 00:00:00,000

Alphabetic characters

7

00:00:00,000 --> 00:00:00,000

Alphabetic characters Alphabetic characters

8

> 00:00:00,000 --> 00:00:00,000

Alphabetic characters Alphabetic characters
.....

using sed. I'm not clear how to join a line with the preceeding one. Any suggestion?

Upvotes: 0

Views: 114

Answers (3)

Ed Morton
Ed Morton

Reputation: 203712

sed is for simple subsitutions on individual lines, that is all. For anything else you should be using awk:

$ awk '/[[:alpha:]]/{ if (buf=="") {buf=$0; next} else {$0=buf OFS $0; buf=""} } 1' file

5

00:00:00,000 --> 00:00:00,000

Alphabetic characters Alphabetic characters

6

00:00:00,000 --> 00:00:00,000


7

00:00:00,000 --> 00:00:00,000

Alphabetic characters Alphabetic characters

8

00:00:00,000 --> 00:00:00,000

Alphabetic characters Alphabetic characters

.....

The above will work robustly, portably, and efficiently on all UNIX systems with all POSIX-compatible awks.

Upvotes: 1

SLePort
SLePort

Reputation: 15461

Another approach with sed:

sed '/^[[:alpha:]]/{N;/\n[[:alpha:]]/s/\n/ /}' file

When a line starting with alphabetic characters is found, add next line to the pattern space using the N command. Then replace newline when followed by alphabetic characters with a space.

Upvotes: 1

eddiem
eddiem

Reputation: 1030

sed '$!{N;/^[a-zA-Z ][^\n]\+\n[a-zA-Z ]/{s/\n/ /}}'

Match two lines back-to-back that meet the condition that the first line starts with an alphabetic character or space, and the second starts with the same. Join them with a space.

Upvotes: 1

Related Questions