Plouf
Plouf

Reputation: 397

Find a line with a single word and merge it with the next line

I have an issue with grep that i can't sort out.

What I have. A listing of firstnames and lastnames, like:

John Doe
Alice Smith
Bob Smith

My problem. Sometimes, firstname and lastname are disjointed, like:

Alice
Smith
Bob Doolittle
Mark
Von Doe //sometimes, there are more than one word on the next line

What I'd like to achieve. Concatenate the "orphan" name with the next line.

Alice Smith
Bod Doolittle
Mark Von Doe

What I already tried

grep -ozP "^\w+\n\w.+" file | tr '\n' ' '

So, here I ask grep to find a line with just one word and concatenate it with the following line, even is this next line has more than one word.

It works correctly but only if the isolated word is at the very beginning of the file. If it appears below the first line, grep do not spot it. So a quick and dirty solution where I would loop through the file and remove a line after each pass doesn't work for me.

Upvotes: 1

Views: 106

Answers (5)

potong
potong

Reputation: 58568

This might work for you (GNU sed):

sed -E 'N;s/^(\S+)\n/\1 /;P;D' file

Append the next line.

If the first line in the pattern space contains one word only, replace the following newline with a space.

Print/delete the first line and repeat.

Upvotes: 0

Gilles Quénot
Gilles Quénot

Reputation: 185790

Using :

awk '
    {f=$2 ? 1 : 0}
    v==1{v=0; print; next} 
    f==0{v=1; printf "%s ", $1; next}
    1
' file

Output

Alice Smith
Bob Doolittle
Mark Von Doe

Upvotes: 0

Timur Shtatland
Timur Shtatland

Reputation: 12465

Use this Perl one-liner:

perl -lane 'BEGIN { $is_first_name = 1; } if ( @F == 1 && $is_first_name ) { @prev = @F; $is_first_name = 0; } else { print join " ", @prev, @F; $is_first_name = 1; @prev = (); }' in_file

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array @F on whitespace or on the regex specified in -F option.

Upvotes: 0

markp-fuso
markp-fuso

Reputation: 35366

If awk is acceptable:

awk '
NF==1 {printf "%s ",$1; getline; print; next}
1' names.dat

Where:

  • NF==1 - if only one name/field in the current record ...
  • printf / getline / print / next - print field #1, read next line and print it, then skip to next line
  • 1 - print all other lines as is

As a one-liner:

awk 'NF==1{printf "%s ",$1;getline;print;next}1' names.dat

This generates:

Alice Smith
Bob Doolittle
Mark Von Doe //sometimes, there are more than one word on the next line

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627469

You can use GNU sed like this:

sed -E -i '/^[^[:space:]]+$/{N;s/\n/ /}' file

See the sed demo:

s='Alice
Smith
Bob Doolittle
Mark
Von Doe //sometimes, there are more than one word on the next line'
sed -E '/^[^[:space:]]+$/{N;s/\n/ /}' <<< "$s"

Output:

Alice Smith
Bob Doolittle
Mark Von Doe //sometimes, there are more than one word on the next line

Details:

  • /^[^[:space:]]+$/ finds a line with no whitespace
  • {N;s/\n/ /} - reads in the next line, and appends a newline char with this new line to the current pattern space, and then s/\n/ / replaces this newline char with a space.

Upvotes: 2

Related Questions