Reputation: 397
I have an issue with grep that i can't sort out.
What I have. A listing of firstnames and lastnames, like:
John Doe
Alice Smith
Bob Smith
My problem. Sometimes, firstname and lastname are disjointed, like:
Alice
Smith
Bob Doolittle
Mark
Von Doe //sometimes, there are more than one word on the next line
What I'd like to achieve. Concatenate the "orphan" name with the next line.
Alice Smith
Bod Doolittle
Mark Von Doe
What I already tried
grep -ozP "^\w+\n\w.+" file | tr '\n' ' '
So, here I ask grep to find a line with just one word and concatenate it with the following line, even is this next line has more than one word.
It works correctly but only if the isolated word is at the very beginning of the file. If it appears below the first line, grep do not spot it. So a quick and dirty solution where I would loop through the file and remove a line after each pass doesn't work for me.
Upvotes: 1
Views: 106
Reputation: 58568
This might work for you (GNU sed):
sed -E 'N;s/^(\S+)\n/\1 /;P;D' file
Append the next line.
If the first line in the pattern space contains one word only, replace the following newline with a space.
Print/delete the first line and repeat.
Upvotes: 0
Reputation: 185790
Using awk:
awk '
{f=$2 ? 1 : 0}
v==1{v=0; print; next}
f==0{v=1; printf "%s ", $1; next}
1
' file
Alice Smith
Bob Doolittle
Mark Von Doe
Upvotes: 0
Reputation: 12465
Use this Perl one-liner:
perl -lane 'BEGIN { $is_first_name = 1; } if ( @F == 1 && $is_first_name ) { @prev = @F; $is_first_name = 0; } else { print join " ", @prev, @F; $is_first_name = 1; @prev = (); }' in_file
The Perl one-liner uses these command line flags:
-e
: Tells Perl to look for code in-line, instead of in a file.
-n
: Loop over the input one line at a time, assigning it to $_
by default.
-l
: Strip the input line separator ("\n"
on *NIX by default) before executing the code in-line, and append it when printing.
-a
: Split $_
into array @F
on whitespace or on the regex specified in -F
option.
Upvotes: 0
Reputation: 35366
If awk
is acceptable:
awk '
NF==1 {printf "%s ",$1; getline; print; next}
1' names.dat
Where:
NF==1
- if only one name/field in the current record ...printf / getline / print / next
- print field #1, read next line and print it, then skip to next line1
- print all other lines as isAs a one-liner:
awk 'NF==1{printf "%s ",$1;getline;print;next}1' names.dat
This generates:
Alice Smith
Bob Doolittle
Mark Von Doe //sometimes, there are more than one word on the next line
Upvotes: 2
Reputation: 627469
You can use GNU sed
like this:
sed -E -i '/^[^[:space:]]+$/{N;s/\n/ /}' file
See the sed
demo:
s='Alice
Smith
Bob Doolittle
Mark
Von Doe //sometimes, there are more than one word on the next line'
sed -E '/^[^[:space:]]+$/{N;s/\n/ /}' <<< "$s"
Output:
Alice Smith
Bob Doolittle
Mark Von Doe //sometimes, there are more than one word on the next line
Details:
/^[^[:space:]]+$/
finds a line with no whitespace{N;s/\n/ /}
- reads in the next line, and appends a newline char with this new line to the current pattern space, and then s/\n/ /
replaces this newline char with a space.Upvotes: 2