suyog
suyog

Reputation: 23

Unix how to concatenate lines based on pattern

I want to join lines in a file as below.

Input

01EPH087362 SHHFHDH 3673
63737
Dhdhj
01EPH636363 DHHDH 
3637737
Hshshhd
01EPH7373838 HDJJDJ

Output

01EPH087362 SHHFHDH 3673 63737 Dhdhj
01EPH636363 DHHDH  3637737Hshshhd
01EPH7373838 HDJJDJ

I want the output as above; basically every line should start with 01EPH.

I have awk and sed but no luck. Please help if you know.

Upvotes: 2

Views: 1322

Answers (9)

RavinderSingh13
RavinderSingh13

Reputation: 133760

@suyog: Could you please try following too and let me know if this helps you.

awk '{printf("%s%s",($0 ~ /^01E/ && NR>1)?ORS:NR>1?FS:"",$0)} END{print ""}' Input_file

Output will be as follows.

01EPH087362 SHHFHDH 3673 63737 Dhdhj                                                                                                                                           
01EPH636363 DHHDH  3637737 Hshshhd                                                                                                                                             
01EPH7373838 HDJJDJ

Upvotes: 2

dawg
dawg

Reputation: 104092

Here is pure Bash (plus printf) to do this just for giggles:

while IFS= read -r line || [[ -n $line ]]; do 
    if [[ "$line" =~ ^01EPH ]]; then
        printf "%s%s" "$pad" "$line" 
        pad=$'\n'
    else
        printf " %s" "$line"
    fi
done <file  

Here is a Perl slurp solution:

perl -0777 -ne 'while (/(^01EPH.*?)(?=^01EPH|\z)/gms) {($st=$1)=~s/\n/ /g; print "$st\n" }' file

In both cases, awk is probably better...

Upvotes: 0

Walter A
Walter A

Reputation: 20032

When you have a file with only \n line-endings, you could use

sed 's/^01EPH/\r&/;$s/$/\r/' inputfile | tr -d "\n" | tr "\r" "\n"

The first part of sed inserts a \r before each 01EPH. The second part appends one at the end so that the last line will end with a linefeed too. Now remove the original linefeeds and replace the marked ones with linefeeds.
It goes through the file 3 times, so any awk solution will be better for a large file, but I just wanted to show trwith sed.

Upvotes: 1

karakfa
karakfa

Reputation: 67567

another awk

$ $ awk 'NR>1 && /^01EPH/ {print ""} 
                          {printf "%s", $0 OFS} 
         END              {print ""}' file

01EPH087362 SHHFHDH 3673 63737 Dhdhj
01EPH636363 DHHDH  3637737 Hshshhd
01EPH7373838 HDJJDJ

add newline when pattern matches (except first line) and at the end, otherwise append lines...

Upvotes: 2

glenn jackman
glenn jackman

Reputation: 247182

My take:

awk '
    /^01EPH/ {printf "%s%s", nl, $0; nl = "\n"; next} 
    {printf " %s", $0} 
    END {print ""}
' file

Upvotes: 2

Ed Morton
Ed Morton

Reputation: 204548

$ awk '/^01EPH/{if (NR>1) print buf; buf=$0; next} {buf = buf OFS $0} END{print buf}' file
01EPH087362 SHHFHDH 3673 63737 Dhdhj
01EPH636363 DHHDH  3637737 Hshshhd
01EPH7373838 HDJJDJ

Upvotes: 2

Akshay Hegde
Akshay Hegde

Reputation: 16997

Input

$ cat f
01EPH087362 SHHFHDH 3673
63737
Dhdhj
01EPH636363 DHHDH 
3637737
Hshshhd
01EPH7373838 HDJJDJ

Output

$ awk '(s=/^01EPH/) && NR>1{print ""}{printf("%s%s",(s?"":" "),$0)}END{print ""}' f
01EPH087362 SHHFHDH 3673 63737 Dhdhj
01EPH636363 DHHDH  3637737 Hshshhd
01EPH7373838 HDJJDJ

Upvotes: 1

streetturtle
streetturtle

Reputation: 5850

One liner:

tr '\n' ' ' < file.txt | sed s/01EPH/\\n01EPH/g

tr '\n' ' ' < file.txt - makes one string

sed s/01EPH/\\n01EPH/g - prefix 01EPH by newline

Upvotes: -2

Jonathan Leffler
Jonathan Leffler

Reputation: 755010

awk '/^01EPH/ { if (record != "") print record; record = ""; pad = "" }
     { record = record pad $0; pad = " " }
     END { if (record != "") print record }'

If the line starts 01EPH, print the saved information, if there is any, and empty the saved information and the padding.

On every line, add the pad and the new line to the saved information; set the pad to a blank.

At the end, print the saved record if there is anything in it.

This even miraculously preserves the double space between DHHDH and 3637737Hshshhd because there is a trailing blank on the line ending DHHDH.

Output:

01EPH087362 SHHFHDH 3673 63737 Dhdhj
01EPH636363 DHHDH  3637737 Hshshhd
01EPH7373838 HDJJDJ

Upvotes: 1

Related Questions