Reputation: 274
I'm am trying to clean up a file that does not seem to write each new log entry on a new line. Some entries end up getting appended to the end of the previous line. This makes it difficult when I try to analyze and get correct counts/data using grep/awk, etc...
This is what each line/entry looks like - note I've replaced the actual contents of the log entry but basically each entry starts with same string and end with the same pattern, an ID/string followed by a number, ex "ID: 20".
Pattern:
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: ##
(Note the number after ID is not always the same for each entry)
What happens is some lines of the file end up having two entries on the same line:
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 33
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 20INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 55
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 20
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 27INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 14
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 35INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 22
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 10
What I would like to do is a new line on that second row above, between the "ID: ##" and "INFO" for all the lines impacted in my files.
I can grep all the links that this occurs in by grep "ID: [0-9]*INFO" mylog.log
and have tried a number of sed commands but cant seem to figure out how to sneak that new line '\n' between a number [0-9]* and INFO...
Any help is appreciated.
Upvotes: 0
Views: 186
Reputation: 784958
Using grep
you can do this:
grep -Eo 'INFO:[[:blank:]].*?ID:[[:blank:]][0-9]+' file
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 33
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 20
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 55
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 20
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 27
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 14
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 35
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 22
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 10
Upvotes: 1
Reputation: 2815
one lazy method for this is first insert new line whenever you notice "INFO" not at left most position, then pipe it to a regex pattern checker to ensure it fits the DATA...DATA4 bill.
Upvotes: 0
Reputation: 203209
Given the posted sample input/output all you need is this with a sed that has -E
to enable EREs and support \n
in the replacement text as meaning "newline" (e.g. GNU and OSX/BSD seds):
$ sed -E 's/([0-9]+)INFO/\1\nINFO/g' file
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 33
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 20
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: ##
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 20
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 27
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: ##
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 35
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 22
INFO: DATA: ## DATA2 ## DATA3: ##.### DATA4: ## ID: 10
or this otherwise with any sed in any shell on every Unix box:
sed 's/\([0-9][0-9]*\)INFO/\1\
INFO/g' file
If that doesn't work for you then fix your example to include cases where that doesn't work.
Upvotes: 3