Reputation: 193
I'm trying to rearrange from a specific string into the respective column. Here is the input
String 1: 47/13528
String 2: 55(s)
String 3:
String 4: 114(n)
String 5: 225(s), 26/10533-10541
String 6: 103/13519
String 7: 10(s), 162(n)
String 8: 152/12345,12346
(d=dead, n=null, s=strike)
The alphabet in each value is the flag (d=dead, n=null, s=strike). The String with value (digit) which is "String 1" will be the 47c1 etc:
String 1: 47/13528
value without any flag will be sorted into the null column along with null tag (n)
String 1 (the integer will be concatenated with 47/13528)
Sorted :
null
47c1@SP13528;114c4;103c6@SP13519;162c7
Str#2: 55(s)
flagged with (s) will be sorted into strike column
Sorted :
strike
55c2;225c5;26c5@SP10533-10541;162c7
I'm trying to parse it by modifying previous code, seems no luck
{
for (i=1; i<=NF; i++) {
num = $i+0
abbr = $i
gsub(/[^[:alpha:]]/,"",abbr)
list[abbr] = list[abbr] num " c " val ORS
}
}
END {
n = split("dead null strike",types)
for (i=1; i<=n; i++) {
name = types[i]
abbr = substr(name,1,1)
printf "name,list[abbr]\n"
}
}
Expected Output (sorted into csv) :
dead,null,strike
,47c1@SP13528;114c4; 26c5@SP10533-10541;103c6@SP13519;162c7, 152c8@SP12345;152c8@SP12346,55c2;225c5;162c7;10c7
Breakdown for crosscheck purpose:
dead
none
null
47c1@SP13528;114c4;103c6@SP13519;162c7;152c8@SP12345;152c8@SP12346;26c5@SP10533-10541;;162c7
strike
55c2;225c5;10c7
Upvotes: 1
Views: 138
Reputation: 5965
Here is an awk script for parsing your file.
BEGIN {
types["d"]; types["n"]; types["s"]
deft = "n"; OFS = ","; sep = ";"
}
$1=="String" {
gsub(/[)(]/,""); gsub(",", " ") # general line subs
for (i=3;i<=NF;i++) {
if (!gsub("/","c"$2+0"@SP", $i)) $i = $i"c"$2+0 # make all subs on items
for (t in types) { if (gsub(t, "", $i)) { x=t; break }; x=deft } #find type
items[x] = items[x]? items[x] sep $i: $i # append for type found
}
}
END {
print "dead" OFS "null" OFS "strike"
print items["d"] OFS items["n"] OFS items["s"]
}
Input:
String 1: 47/13528
String 2: 55(s)
String 3:
String 4: 114(n)
String 5: 225(s), 26/10533-10541
String 6: 103/13519
String 7: 10(s), 162(n)
String 8: 152/12345,12346
(d=dead, n=null, s=strike)
Output:
> awk -f tst.awk file
dead,null,strike
,47c1@SP13528;114c4;26c5@SP10533-10541;103c6@SP13519;162c7;152c8@SP12345;12346c8,55c2;225c5;10c7
Your description was changing on important details, like how we decide the type of an item or how they are separated, and untill now your input and outputs are not consistent to it, but in general I think you can easily get what is done into this script. Have in mind that gsub()
returns the number of the substitutions made, while doing them also, so many times it is convenient to use it as a condition.
Upvotes: 1
Reputation: 140940
My usuall approuch is:
awk
and print them.The following code:
cat <<EOF |
String 1: 47/13528
String 2: 55(s)
String 3:
String 4: 114(n)
String 5: 225(s), 26/10533-10541
String 6: 103/13519
String 7: 10(s), 162(n)
String 8: 152/12345,12346
(d=dead, n=null, s=strike)
EOF
sed '
# filter only lines with String
/^String \([0-9]*\): */!d;
# Remove the String
# Remove the : and spaces
s//\1 /
# remove trailing spaces
s/ *$//
# Remove lines with nothing
/^[0-9]* *$/d
# remove the commas and split lines on comma
# by moving them to separate lines
# repeat that until a comma is found
: a
/\([0-9]*\) \(.*\), *\(.*\)/{
s//\1 \2\n\1 \3/
ba
}
' | sed '
# we should be having two fields here
# separated by a single space
/^[^ ]* [^ ]*$/!{
s/.*/ERROR: "&"/
q1
}
# Move the name in braces to separate column
/(\(.\))$/{
s// \1/
b not
} ; {
# default is n
s/$/ n/
} ; : not
# shuffle first and second field
# to that <num>c<num>(@SP<something>)? format
# if second field has a "/"
\~^\([0-9]*\) \([0-9]*\)/\([^ ]*\)~{
# then add a SP
s//\2c\1@SP\3/
b not2
} ; {
# otherwise just do a "c" between
s/\([0-9]*\) \([0-9]*\)/\2c\1/
} ; : not2
' |
sort -n -k1 |
# now it's trivial
awk '
{
out[$2] = out[$2] (!length(out[$2])?"":";") $1
}
function outputit(name, idx) {
print name
if (length(out[idx]) == 0) {
print "none"
} else {
print out[idx]
}
printf "\n"
}
END{
outputit("dead", "d")
outputit("null", "n")
outputit("strike", "s")
}
'
dead
none
null
26c5@SP10533-10541;47c1@SP13528;103c6@SP13519;114c4;152c8@SP12345;162c7;12346c8
strike
10c7;55c2;225c5
The output I believe matches yours up to the sorting order with the ;
separated list, which you seem to sort first column then second column, I just sorted with sort
.
Upvotes: 1