Reputation: 219

Using sed, awk, etc. to separate after middle dot characters

I could use your assistance for something; I promise I tried really hard to search for answers, but no luck.

I want to separate text between every occurrence of the "·" (middle dot) character (by syllables, basically).

echo con·grat·u·late | sed -e 's/·.*$/·/1'

The code above outputs:

con·

That is the first part of what I want, but ultimately I would like an output of:

con·
grat·
u·
late

This will involve getting the characters between the 1st-2nd, and the 2nd-3rd occurrences of "·"

If anyone can guide me in the right direction, I will really appreciate it, and I will figure the rest out on my own.

EDIT My apologies, I displayed my desired output incorrectly. Your solution's worked great, however.

Since it is important for me to keep everything as a single line, how would I output the text between the first dot and the second one, to output:

grat·

I am doing it in UTF-8, Jonathan

Once again, sorry for asking the wrong thing.

Upvotes: 3

Answers (4)

anishsane

Reputation: 20980

You can use simple awk to get these words separated:

$ echo 'con.grat.u.late' | awk -F. '{print $1}'
con
$ echo 'con.grat.u.late' | awk -F. '{print $2}'
grat
$ echo 'con.grat.u.late' | awk -F. '{print $3}'
u
$ echo 'con.grat.u.late' | awk -F. '{print $4}'
late

$ echo 'con.grat.u.late' | awk -F. '{for(i=1;i<=NF;i++){print $i}}' 
con
grat
u
late

-F. implies use . as field separator

Upvotes: 2

repzero

Reputation: 8402

Since you are looking to run characters between the dots, You can try sed like this

echo 'con.grat.u.late'|sed 's/\.*\./&\n/g'|sed  -n 2p|tr -d '.'

to print group of characters between 1st and 2nd dot

echo 'con.grat.u.late'|sed 's/\.*\./&\n/g'|sed  -n 2p|tr -d '.'

results

grat

note: I use 2p to print characters between 1st dot and 2nd dot

print group of characters between 2nd dot and 3rd

echo 'con.grat.u.late'|sed 's/\.*\./&\n/g'|sed  -n 3p|tr -d '.'

results

note: I use 3p to print characters between 2nd dot and 3rd dot

You can also do the whole thing with sed but I use tr command so it will be easy for you to follow. The tr command delete the dots before printing. If you want to dots then exclude |tr -d '.' from your command line.

You can also print ranges of group of characters

echo 'con.grat.u.late'|sed 's/\.*\./&\n/g'|sed  -n 1,3p|tr -d '.'

results

con
grat
u

Upvotes: 2

janos

Reputation: 124734

In GNU sed you can do this:

echo con·grat·u·late | sed -e 's/·/&\n/g'

The & stands for the matched pattern, in this example the ·. Unfortunately this doesn't work in BSD sed.

For a more portable solution, I recommend this AWK, which should work in both GNU and BSD systems:

echo con·grat·u·late | awk '{ gsub("·", "&\n") } 1'

Upvotes: 3

Wintermute

Reputation: 44063

Simply

echo con·grat·u·late | sed -e 's/·/·\n/g'

that replaces every · with a · followed by a newline.

Upvotes: 1

Using sed, awk, etc. to separate after middle dot characters

Answers (4)

Related Questions