Reputation: 325

Using bash for iterating in dictionary like formats

I have a python like dictionary in an input file:

$ cat test.txt
db={1:['a','b','c','d'], 2:['aa','bb','cc','dd']}

Each list in dictionary only has 4 indexes not less or more. I need a result like:

one1="a"
two1="b"
three1="c"
four1="d"

one2="aa"
two2="bb"
three2="cc"
four2="dd"

I know this is simple if we use python here, but I should do the job in bash script. Is it possible? How can I do the job using bash script?

Upvotes: 1

Answers (3)

Ed Morton

Reputation: 203684

This will work robustly using any awk in any shell on all UNIX boxes and is trivial to enhance if you need to use it for more than 4 items per list just by adding more names for numbers to the string in the BEGIN section:

$ cat tst.awk
BEGIN { split("one two three four",names) }
{
    while ( match($0,/[0-9]+:\[('[^']*',?)+/) ) {
        idx = list = substr($0,RSTART,RLENGTH)

        sub(/:.*/,"",idx)
        sub(/[^[]+\[/,"",list)

        split(list,items,/'/)
        for (i=2; i in items; i+=2) {
            printf "%s%d=\"%s\"\n", names[i/2], idx, items[i]
        }
        print ""

        $0 = substr($0,RSTART+RLENGTH)
    }
}

$ awk -f tst.awk file
one1="a"
two1="b"
three1="c"
four1="d"

one2="aa"
two2="bb"
three2="cc"
four2="dd"

Upvotes: 1

M. Nejat Aydin

Reputation: 10133

This can be done with a single sed command (Tested in GNU sed 4.8. Assumes the whole expression is in a single line and there is no embedded single quote between a pair of matching single quotes):

echo "db={1:['a','b','c','d'], 2:['aa','bb','cc','dd']}" |
sed -E "s/^[^{]*\{//; s/\}[^}]*$//; s/([^:]+):\['([^']*)','([^']*)','([^']*)','([^']*)'\](, *)?/one\1='\2'\ntwo\1='\3'\nthree\1='\4'\nfour\1='\5'\n\n/g"

outputs

one1='a'
two1='b'
three1='c'
four1='d'

one2='aa'
two2='bb'
three2='cc'
four2='dd'

Explanation:

-E

Use extended regular expression so that we don't quote (, ), + characters.

s/^[^{]*\{//;

Deletes characters at the beginning of the line until and including the { character

s/\}[^}]*$//;

Deletes the } character and trailing characters (if any) at the end of line

s/([^:]+):\['([^']*)','([^']*)','([^']*)','([^']*)'\](, *)?/one\1='\2'\ntwo\1='\3'\nthree\1='\4'\nfour\1='\5'\n\n/g
  -------    -------   -------   -------   -------   -----  -----------------------------------------------------
     1          2         3         4         5        6                      R

1: Captures the text until :
2: Captures the text between the first pair of single quotes
3: Captures the text between the second pair of single quotes
4: Captures the text between the third pair of single quotes
5: Captures the text between the fourth pair of single quotes
6: Captures the , and any number of trailing space characters. This subexpression is not used in the replacement text. ? means this is optional.
R: Replacement text. \1, \2, \3, \4, and \5 are replaced with the corresponding captured text.
The g flag at the end of the s command ensures that the replacement is applied to all matches.

Upvotes: 3

Karthik Radhakrishnan

Reputation: 944

you just need to strip off all the unnecessary characters and loop through them to get your result

#!/bin/bash
db="{1:['a','b','c','d'], 2:['aa','bb','cc','dd']}"
count=1
for items in `echo $db|sed 's/{//;s/}//'`
do
        echo one${count} = `echo $items|sed 's/^.*\[//;s/\].*$//'|cut -d ',' -f1`
        echo two${count} = `echo $items|sed 's/^.*\[//;s/\].*$//'|cut -d ',' -f2`
        echo three${count} = `echo $items|sed 's/^.*\[//;s/\].*$//'|cut -d ',' -f3`
        echo four${count} = `echo $items|sed 's/^.*\[//;s/\].*$//'|cut -d ',' -f4`
        echo ''
        count=`expr $count + 1`
done

Output

one1 = 'a'
two1 = 'b'
three1 = 'c'
four1 = 'd'

one2 = 'aa'
two2 = 'bb'
three2 = 'cc'
four2 = 'dd'

Upvotes: 1

Using bash for iterating in dictionary like formats

Answers (3)

Related Questions