전세영
전세영

Reputation: 19

I want to remove multiple line of text on linux

Just like this.
Before:

1
19:22
abcde

2
19:23

3
19:24
abbff

4
19:25
abbc

After:

1
19:22
abcde

3
19:24
abbff

4
19:25
abbc

I want remove the section having no alphabet like section 2.
I think that I should use perl or sed. But I don't know how to do. I tried like this. But it didn't work.

sed 's/[0-9]\n[0-9]\n%s\n//'

Upvotes: 0

Views: 2174

Answers (5)

kvantour
kvantour

Reputation: 26481

Similar to the solution of Ed Morton but with the following assumptions:

  • The text blocks consist of 2 or 3 lines.
  • If there is a third line, it contains characters from any alphabet.

In essence, under these conditions we only need to check for a third field:

awk 'BEGIN{RS=;ORS="\n\n";FS="\n"}(NF<3)' file

or similar without BEGIN:

awk -v RS= -v ORS='\n\n' -F '\n' '(NF<3)' file

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203674

sed is for doing s/old/new/ on individual lines, that is all. For anything else you should be using awk:

$ awk -v RS= -v ORS='\n\n' '/[[:alpha:]]/' file
1
19:22
abcde

3
19:24
abbff

4
19:25
abbc

The above is simply this:

  • RS= tells awk the input records are separated by blank lines.
  • ORS='\n\n' tells awk the output records must also be separated by blank lines.
  • /[[:alpha:]]/ searches for and prints records that contain alphabetic characters.

Upvotes: 4

potong
potong

Reputation: 58430

This might work for you (GNU sed):

sed ':a;$!{N;/^$/M!ba};/[[:alpha:]]/!d' file

Gather up lines delimited by an empty line or end-of-file and delete the latest collection if it does not contain an alpha character.

This presupposes that the file format is fixed as in the example. To be more accurate use:

sed -r ':a;$!{N;/^$/M!ba};/^[1-9][0-9]*\n[0-9]{2}:[0-9]{2}\n[[:alpha:]]+\n?$/!d' file

Upvotes: 0

Dave Cross
Dave Cross

Reputation: 69274

Simple enough in Perl. The secret is to put Perl in "paragraph mode" by setting the input record separator ($/) to an empty string. Then we only print records if they contain a letter.

#!/usr/bin/perl

use strict;
use warnings;

# Paragraph mode
local $/ = '';

# Read from STDIN a record (i.e. paragraph) at a time
while (<>) {
  # Only print records that include a letter
  print if /[a-z]/i;
}

This is written as a Unix filter, i.e. it reads from STDIN and writes to STDOUT. So if it's in a file called filter, you can call it like this:

$ filter < your_input_file > your_output_file

Alternatively this is a simple command line script in Perl (-00 is the command line option to put Perl into paragraph mode):

$ perl -00 -ne'print if /[a-z]/' < your_input_file > your_output_file

Upvotes: 4

bipll
bipll

Reputation: 11940

If there's exactly one blank line after each paragraph you can use a long awk oneliner (three patterns, so probably not a oneliner actually):

$ echo '1
19:22
abcde

2
19:23

3
19:24
abbff

4
19:25
abbc
' |  awk '/[^[:space:]]/ { accum = accum $0 "\n" } /^[[:space:]]*$/ { if(on) print accum $0; on = 0; accum = "" } /[[:alpha:]]/ { on =  1 }'
1
19:22
abcde

3
19:24
abbff

4
19:25
abbc

The idea is to accumulate non-blank lines, setting flag once an alphabetical character found, and on a blank input line, flush the whole accumulated paragraph if that flag is set, reset accum to empty string and reset flag to zero.

(Note that if the last line of input is not necessarily empty you might need to add an END block that checks if currently there's a paragraph unflushed and flush it as needed.)

Upvotes: 0

Related Questions