Dalek
Dalek

Reputation: 4318

extract some part of a list of strings and set them in a list without any repetition

I have a list of file names I have tried to extract the index between sil. and .asc and put them in a list while I do not to have the repetition of indexes in my list. The following is some part of the list of my files.

ellip5.0.apo.3.sil.16.asc
ellip5.0.apo.3.sil.7.asc
ellip5.0.apo.3.sil.8.asc
ellip5.0.apo.4.sil.3.asc
ellip5.0.apo.4.sil.14.asc
ellip5.0.apo.4.sil.5.asc
ellip5.0.apo.4.sil.6.asc
ellip5.0.apo.4.sil.7.asc
ellip5.0.apo.4.sil.8.asc
ellip5.0.apo.5.sil.3.asc
ellip5.0.apo.5.sil.14.asc
ellip5.0.apo.5.sil.5.asc
ellip5.0.apo.5.sil.6.asc
ellip5.0.apo.5.sil.7.asc
ellip5.0.apo.5.sil.8.asc
ellip5.0.apo.6.sil.3.asc
ellip5.0.apo.6.sil.4.asc
ellip5.0.apo.6.sil.5.asc
ellip5.0.apo.6.sil.16.asc
ellip5.0.apo.6.sil.7.asc
ellip5.0.apo.6.sil.8.asc
ellip5.0.apo.7.sil.13.asc
ellip5.0.apo.7.sil.4.asc
ellip5.0.apo.7.sil.5.asc

The following code is my attempt to make the list but it doesn't work

args=()
containsElement () {
  local e
  for e in "${@:2}"; do [[ "$e" == "$1" ]] && return 0; done
  return 1
}
for MYVAR in  "ellip*.asc"
j=0
for i in $(ls ellip*.asc)
do
  INDEX=`echo $i | grep -oE 'sil.[^/]+.asc' | cut -c5- | rev | cut -c5- | rev`
  listcontains INDEX "${args[@]}" 
  if [ $? == 1 ];then
        args[j]=$INDEX
        j=$(($j + 1))
        echo $INDEX
   fi
done
echo ${args[@]}

Any suggestion will be appreciated.. My expected list would be :

16 7 8 3 14 5 6 16 4 13

and preferably a sorted list.

Upvotes: 1

Views: 52

Answers (3)

Steve Summit
Steve Summit

Reputation: 47942

I would use something like

ls ellip*.asc | cut -f 6 -d . | sort -nu

The cut program does just what you want here, selecting the 6th field as separated by delimiters of . .

Upvotes: 2

If you don't worry about using some utilities (which you probably don't, as you already have grep, cut and rev in your example), then you can do this in a oneliner:

arr=($(sed 's/ /\n/g' <<< $(echo *.sil.*.asc) |cut -d. -f6 |sort -n |uniq))

This will first get your file list (note that you need echo to input your file list to sed, since pathnames are not expanded after <<<), break it into lines, select the 6th field with delimiters set to ., then choose a uniqe value from each (also note that uniq needs a sorted list as input). This list is then assigned to an array.

Also also note that in your example you have:

...
for i in $(ls ellip*.asc)
do
...

Here you parse the output of ls, which you should generally avoid, see here. Specifically in this case it would probably be safe, as your filenames have a fixed format.

Upvotes: 0

anubhava
anubhava

Reputation: 785108

You can use this script in BASH 4:

# declare an associative array
declare -A arr

for f in ellip*.asc; do
    f="${f/#*sil.}"
    f="${f%.asc}"
    arr["$f"]=1
done

# print sorted index values
printf "%s\n" "${!arr[@]}" | sort -n
3
4
5
6
7
8
13
14
16

In older BASH where associative array is not supported use:

declare -a arr

for f in ellip*.asc; do
    f="${f/#*sil.}"
    f="${f%.asc}"
    arr+=("$f")
done

sort -un <(printf "%s\n" "${arr[@]}")

Output:

3
4
5
6
7
8
13
14
16

Upvotes: 2

Related Questions