Clean output using sed

Question

I have a file that begins with this kind of format

INFO|NOT-CLONED|/folder/another-folder/another-folder|last-folder-name|

What I need is to read the file and get this output:

INFO|NOT-CLONED|last-folder-name

I have this so far:

cat clone_them.log | grep 'INFO|NOT-CLONED' | sed -E 's/INFO\|NOT-CLONED\|(.*)/g'

But is not working as intended

NOTE: the last "another-folder" and "last-folder-name is the same

John1024 · Accepted Answer

If you want a sed solution:

$ sed -En 's/(INFO\|NOT-CLONED\|).*\|([^|]*)\|$/\1\2/p' file
INFO|NOT-CLONED|last-folder-name

How it works:

-E

Use extended regex
-n

Don't print unless we explicitly tell it to.
s/(INFO\|NOT-CLONED\|).*\|([^|]*)\|$/\1\2/p

Look for lines that include INFO|NOT-CLONED| (save this in group 1) followed by anything, .*, followed by | followed by any characters not |, [^|]* (saved in group 2), followed by | at the end of the line. The replacement text is group 1 followed by group 2.

The p option tells sed to print the line if the match succeeds. Since the substitution only succeeds for lines that contain INFO|NOT-CLONED|, this eliminates the need for an extra grep process.

Variation: Returning just the last-folder-name

To just get the last-folder-name without the INFO|NOT-CLONED, we need only remove \1 from the output:

$ sed -En 's/(INFO\|NOT-CLONED\|).*\|([^|]*)\|$/\2/p' file
last-folder-name

Since we no longer need the first capture group, we could simplify and remove the now unneeded parens so that the only capture group is the last folder name:

$ sed -En 's/INFO\|NOT-CLONED\|.*\|([^|]*)\|$/\1/p' file
last-folder-name

Clean output using sed

Answers (2)

Variation: Returning just the last-folder-name

Related Questions