Álvaro
Álvaro

Reputation: 1359

Filter words starting and ending with hyphen but not when it's found in the middle

I have a list of words I want to filter: only those that starts or ends with a hyphen but not those with a hyphen in the middle. That is, to filter entries like: "a-" or "-cefalia" but not "castellano-manchego".

I have tried with many options and the most similar thing I've found it'sgrep -E '*\-' minilemario.txt however it filters all hyphens. Could you please provide me with a solution?

    a
    a-
    aarónico
    aaronita
    amuzgo
    an-
    -án
    ana
    -ana
    ana-
    anabaptismo
    anabaptista
    blablá
    bla-bla-bla
    blanca
    castellano
    castellanohablante
    castellano-leonés
    castellano-manchego
    castellanoparlante
    cedulario
    cedulón
    -céfala
    cefalalgia
    cefalálgico
    cefalea
    -cefalia
    cefálica
    cefálico
    cefalitis
    céfalo
    -céfalo
    cefalópodo
    cefalorraquídeo
    cefalotórax
    cefea
    ciabogar
    cian
    cian-
    cianato
    cianea
    cianhídrico
    cianí
    ciánico
    cianita
    ciano-
    cianógeno
    cianosis
    cianótico
    cianuro
    ciar
    ciática
    ciático
    zoo
    zoo-
    zoófago

Upvotes: 1

Views: 247

Answers (2)

Saucier
Saucier

Reputation: 4360

Here is a bash only solution. Please see the comments for details:

#!/usr/bin/env bash

# Assign the first argument (e.g. a textfile) to a variable
input="$1"

# Bash 4 - read the data line by line into an array
readarray -t data < "$input"

# Bash 3 - read the data line by line into an array
#while read line; do
#    data+=("$line")
#done < "$input"

# For each item in the array do something
for item in "${data[@]}"; do

    # Line starts with "-" or ends with "-"
    [[ "$item" =~ ^-|-$ ]] && echo "$item"

done

This will produce the following output:

$ ./script input.txt
a-
an-
-án
-ana
ana-
-céfala
-cefalia
-céfalo
cian-
ciano-
zoo-

Upvotes: 0

devnull
devnull

Reputation: 123528

Using grep, say:

grep -E '^-|-$' filename

to get the words starting and ending with -. And

grep -v -E '^-|-$' filename

to exclude the words starting and ending with -.

^ and $ are anchors denoting the start and end of line respectively. You used '*\-' which would match anything followed by - (it doesn't say that - is at the end of the line).

Upvotes: 4

Related Questions