Gareth
Gareth

Reputation: 1

merge lines based on pattern

I have been struggling to figure out how to 'unparse' lines in an log file (with 2 new line delimiters - '@' and '|') so all lines related to one time stamp are on one line.

Example:
2016-03-22 blah blah blah
|blah blah
|blah blah blah
@blah 
|blah blah blah
2016-03-22 blah blah blah
|blah blah blah
@blah blah
@blah blah blah
|blah 

Required Output

2016-03-22 blah blah blah |blah blah |blah blah blah @blah |blah blah blah
2016-03-22 blah blah blah |blah blah blah @blah blah @blah blah blah |blah

I thought I had this sussed simply by using xarg to put everything on one line then using sed to add new lines at 2016 but i discovered there is a limit on characters on one line and the log file is so big xargs was creating multiple lines.

Removing the carriage returns from lines starting with | and @ would solve this but can't fathom how to do this either.

I've searched on here and found a few people posting similar questions but I can't interpret some of the solutions to fit in with my issue as I'm not familiar enough with sed/awk/xargs.

Would appreciate if anyone can offer some suggestions.

Thanks

Upvotes: 0

Views: 180

Answers (5)

vic4567
vic4567

Reputation: 1

But how to merge lines (only words from lines), when this word exists in both files? All words are changing automaticaly and files 1.txt and 2.txt are changing automatically too as part of package manager's script in Gnome 2 environment. And "link" means http://link

example INPUT:

1.txt contains detected http and version of packages:

link1/autotools-dev_20100122.1

link4/debhelper_8.0.0

link5/dreamchess_0.2.0

link5/dreamchess_0.2.0-2

link7/quilt_0.48

link7/quilt_0.48-7

link34/quilt-el_0.46.2

link34/quilt-el_0.46.2-1

2.txt contains needed extensions of packages:

autotools-dev_*.diff.gz

debhelper_*.diff.gz

debhelper_*.orig.tar.gz

libmxml-dev_*.diff.gz

libmxml-dev_*.dsc

libmxml-dev_*.orig.tar.gz

libsdl1.2-dev_*.diff.gz

libsdl1.2-dev_*.dsc

libsdl1.2-dev_*.orig.tar.gz

libsdl-image1.2-dev_*.diff.gz

libsdl-image1.2-dev_*.dsc

libsdl-image1.2-dev_*.orig.tar.gz

quilt_*.diff.gz

DESIRED OUTPUT to file 3.txt:

link1/autotools-dev_20100122.1.diff.gz

link4/debhelper_8.0.0.diff.gz

link4/debhelper_8.0.0.orig.tar.gz

libmxml-dev_*.diff.gz

libmxml-dev_*.dsc

libmxml-dev_*.orig.tar.gz

libsdl1.2-dev_*.diff.gz

libsdl1.2-dev_*.dsc

libsdl1.2-dev_*.orig.tar.gz

libsdl-image1.2-dev_*.diff.gz

libsdl-image1.2-dev_*.dsc

libsdl-image1.2-dev_*.orig.tar.gz

link7/quilt_0.48.diff.gz

link7/quilt_0.48-7.diff.gz

So needed script, which automaticaly detects common package name in files 1.txt and 2.txt and to file 3.txt suitable inserts to the same line where package name exist:

  • http and version from file 1.txt

  • extension from file 2.txt

  • lines from file 2.txt which not contain package name in file 1.txt

Upvotes: 0

Walter A
Walter A

Reputation: 19982

Remove the newlines, add a newline at the end of the line and insert newlines before each 2016:

echo '2016-03-22 blah blah blah
|blah blah
|blah blah blah
@blah
|blah blah blah
2016-03-22 blah blah blah
|blah blah blah
@blah blah
@blah blah blah
|blah ' | tr -d '\n' | sed -e 's/$/\n/' -e 's/2016-/\n2016-/g'

Upvotes: 0

potong
potong

Reputation: 58371

This might work for you (GNU sed):

sed ':a;N;/\n....-..-.. /!s/\n/ /;ta;P;D' file

Read two lines into the pattern space and if the newline is not the start of a new record, replace it by a space and repeat i.e. append another line to the existing one etc. If the line appended is the start of a new record, print the first line, delete it and repeat.

Upvotes: 0

Etan Reisner
Etan Reisner

Reputation: 80921

anubhava's answer works but it buffers the entirety of each line before printing it.

This prints as it reads each input line.

awk '{printf "%s%s", /^[|@]/?OFS:(NR>1)?"\n":"", $0} END{print ""}'
  • /^[|@]/ match lines starting with @ or |
  • ?OFS if matched lead with OFS (output field separator, space by default)
  • : otherwise
    • (NR>1) if we aren't on the first line
    • ?"\n" output a newline
    • :"" otherwise output a blank (to avoid a blank line at the top of the output)
  • END{print ""} make sure we end the last line with a newline

Upvotes: 0

anubhava
anubhava

Reputation: 784958

You can use this awk command:

awk '/^[0-9]{4}(-[0-9]{2}){2}/ {
   if (p!="")
      print p
   p=$0
   next
}
{
   p = p OFS $0
}
END { 
   print p
}' file

2016-03-22 blah blah blah |blah blah |blah blah blah @blah |blah blah blah
2016-03-22 blah blah blah |blah blah blah @blah blah @blah blah blah |blah

Upvotes: 1

Related Questions