Reputation: 325
I have a python like dictionary in an input file:
$ cat test.txt
db={1:['a','b','c','d'], 2:['aa','bb','cc','dd']}
Each list in dictionary only has 4 indexes not less or more. I need a result like:
one1="a"
two1="b"
three1="c"
four1="d"
one2="aa"
two2="bb"
three2="cc"
four2="dd"
I know this is simple if we use python here, but I should do the job in bash script. Is it possible? How can I do the job using bash script?
Upvotes: 1
Views: 1046
Reputation: 203684
This will work robustly using any awk in any shell on all UNIX boxes and is trivial to enhance if you need to use it for more than 4 items per list just by adding more names for numbers to the string in the BEGIN section:
$ cat tst.awk
BEGIN { split("one two three four",names) }
{
while ( match($0,/[0-9]+:\[('[^']*',?)+/) ) {
idx = list = substr($0,RSTART,RLENGTH)
sub(/:.*/,"",idx)
sub(/[^[]+\[/,"",list)
split(list,items,/'/)
for (i=2; i in items; i+=2) {
printf "%s%d=\"%s\"\n", names[i/2], idx, items[i]
}
print ""
$0 = substr($0,RSTART+RLENGTH)
}
}
.
$ awk -f tst.awk file
one1="a"
two1="b"
three1="c"
four1="d"
one2="aa"
two2="bb"
three2="cc"
four2="dd"
Upvotes: 1
Reputation: 10133
This can be done with a single sed
command (Tested in GNU sed 4.8. Assumes the whole expression is in a single line and there is no embedded single quote between a pair of matching single quotes):
echo "db={1:['a','b','c','d'], 2:['aa','bb','cc','dd']}" |
sed -E "s/^[^{]*\{//; s/\}[^}]*$//; s/([^:]+):\['([^']*)','([^']*)','([^']*)','([^']*)'\](, *)?/one\1='\2'\ntwo\1='\3'\nthree\1='\4'\nfour\1='\5'\n\n/g"
outputs
one1='a'
two1='b'
three1='c'
four1='d'
one2='aa'
two2='bb'
three2='cc'
four2='dd'
Explanation:
-E
Use extended regular expression so that we don't quote (
, )
, +
characters.
s/^[^{]*\{//;
Deletes characters at the beginning of the line until and including the {
character
s/\}[^}]*$//;
Deletes the }
character and trailing characters (if any) at the end of line
s/([^:]+):\['([^']*)','([^']*)','([^']*)','([^']*)'\](, *)?/one\1='\2'\ntwo\1='\3'\nthree\1='\4'\nfour\1='\5'\n\n/g
------- ------- ------- ------- ------- ----- -----------------------------------------------------
1 2 3 4 5 6 R
1: Captures the text until :
2: Captures the text between the first pair of single quotes
3: Captures the text between the second pair of single quotes
4: Captures the text between the third pair of single quotes
5: Captures the text between the fourth pair of single quotes
6: Captures the ,
and any number of trailing space characters. This subexpression is not used in the replacement text. ?
means this is optional.
R: Replacement text. \1
, \2
, \3
, \4
, and \5
are replaced with the corresponding captured text.
The g
flag at the end of the s
command ensures that the replacement is applied to all matches.
Upvotes: 3
Reputation: 944
you just need to strip off all the unnecessary characters and loop through them to get your result
#!/bin/bash
db="{1:['a','b','c','d'], 2:['aa','bb','cc','dd']}"
count=1
for items in `echo $db|sed 's/{//;s/}//'`
do
echo one${count} = `echo $items|sed 's/^.*\[//;s/\].*$//'|cut -d ',' -f1`
echo two${count} = `echo $items|sed 's/^.*\[//;s/\].*$//'|cut -d ',' -f2`
echo three${count} = `echo $items|sed 's/^.*\[//;s/\].*$//'|cut -d ',' -f3`
echo four${count} = `echo $items|sed 's/^.*\[//;s/\].*$//'|cut -d ',' -f4`
echo ''
count=`expr $count + 1`
done
Output
one1 = 'a'
two1 = 'b'
three1 = 'c'
four1 = 'd'
one2 = 'aa'
two2 = 'bb'
three2 = 'cc'
four2 = 'dd'
Upvotes: 1