mk97
mk97

Reputation: 274

How to insert new line between two characters found in regex

I'm am trying to clean up a file that does not seem to write each new log entry on a new line. Some entries end up getting appended to the end of the previous line. This makes it difficult when I try to analyze and get correct counts/data using grep/awk, etc...

This is what each line/entry looks like - note I've replaced the actual contents of the log entry but basically each entry starts with same string and end with the same pattern, an ID/string followed by a number, ex "ID: 20".

Pattern:

INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: ##

(Note the number after ID is not always the same for each entry)

What happens is some lines of the file end up having two entries on the same line:

INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 33
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 20INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 55
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 20
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 27INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 14
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 35INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 22
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 10

What I would like to do is a new line on that second row above, between the "ID: ##" and "INFO" for all the lines impacted in my files.

I can grep all the links that this occurs in by grep "ID: [0-9]*INFO" mylog.log

and have tried a number of sed commands but cant seem to figure out how to sneak that new line '\n' between a number [0-9]* and INFO...

Any help is appreciated.

Upvotes: 0

Views: 186

Answers (3)

anubhava
anubhava

Reputation: 784958

Using grep you can do this:

grep -Eo 'INFO:[[:blank:]].*?ID:[[:blank:]][0-9]+' file

INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 33
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 20
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 55
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 20
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 27
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 14
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 35
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 22
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 10

Upvotes: 1

RARE Kpop Manifesto
RARE Kpop Manifesto

Reputation: 2815

one lazy method for this is first insert new line whenever you notice "INFO" not at left most position, then pipe it to a regex pattern checker to ensure it fits the DATA...DATA4 bill.

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203209

Given the posted sample input/output all you need is this with a sed that has -E to enable EREs and support \n in the replacement text as meaning "newline" (e.g. GNU and OSX/BSD seds):

$ sed -E 's/([0-9]+)INFO/\1\nINFO/g' file
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 33
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 20
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: ##
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 20
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 27
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: ##
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 35
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 22
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 10

or this otherwise with any sed in any shell on every Unix box:

sed 's/\([0-9][0-9]*\)INFO/\1\
INFO/g' file

If that doesn't work for you then fix your example to include cases where that doesn't work.

Upvotes: 3

Related Questions