Reputation: 77

Regex and positioning items in .csv format

Here what I need to solve:

Giving the following set of letter has a header of a .csv file: H,A,D,E,R,T,Y,B,D

I need to process group of letters hand having them places in the proper position: By example, giving the following group of letters: E,R,T,Y or B,D, or T,Y,B,D or H,A,D,E,R etc..

Each Letter Having its own fixed position Ex: "H" is always the first letter of the line, "A" the second, etc... I need to position group of letters in commas separated and keeping the proper position

Ex for a group of letters ERTY I will have: ,,,E,R,T,Y,,,
And for HADER I will have H,A,D,E,R,,,,

My First attempt was by counting the number of missing commas. Ex:

echo "E,R,T,Y" | sed 's/[^,]//g' | awk '{ print length }' | xargs -n 1 bash -c 'echo $((9-$1))' args`

Now I'm trying to add the missing commas to the proper positions. But I got stuck at this step.

Upvotes: 4

Answers (3)

potong

Reputation: 58371

This might work for you (GNU sed):

sed -r 's/$/\nH,A,D,E,R,T,Y,B,D/;s/(.*)\n(.*)\1(.*)/\2\n\1\n\3/;h;s/[^,\n]//g;G;s/^(.*)\n.*\n(.*)\n.*\n(.*)\n.*/\1\3\2/' file

Append the set of letters to the partial. Place markers eitherside of the partial (using a backreference). Copy the result, remove letters from the string leaving ,'s and the markers (\ns). Append the copy and rearrange the string using the markers.

Upvotes: 0

glenn jackman

Reputation: 246764

Using bash and GNU grep:

partial() { 
    # $1 is the header
    # $2 is the "substring" line
    local prefix suffix
    prefix=$( grep -oP ".*(?=$2)"  <<<"$1" ) || return 1
    suffix=$( grep -oP "(?<=$2).*" <<<"$1" )
    echo "${prefix//[^,]/}${2}${suffix//[^,]/}"
}
partial "H,A,D,E,R,T,Y,B,D" "B,D"
partial "H,A,D,E,R,T,Y,B,D" "A,D,E"
partial "H,A,D,E,R,T,Y,B,D" "A,D,E,"
partial "H,A,D,E,R,T,Y,B,D" "foo" || echo "foo is not a substring"

,,,,,,,B,D
,A,D,E,,,,,
,A,D,E,,,,,
foo is not a substring

A version that does not rely on grep:

partial () { 
    local prefix suffix
    prefix=${1%%${2}*}
    [[ $prefix == "$1" ]] && return 1
    suffix=${1##*${2}}
    echo "${prefix//[^,]/}${2}${suffix//[^,]/}"
}

Upvotes: 1

anubhava

Reputation: 784998

Following awk script should work:

s='H,A,D,E,R,T,Y,B,D'

awk -v p='HADER' -F, 'NR==1{for (i=1; i<=NF; i++) 
 {printf "%s%s", index(p, $i)?$i:"", (i<NF)?OFS:RS; sub($i, "", p)} print ""}' OFS=, <<<"$s"
H,A,D,E,R,,,,

awk -v p='ERTY' -F, 'NR==1{for (i=1; i<=NF; i++)
 {printf "%s%s", index(p, $i)?$i:"", (i<NF)?OFS:RS; sub($i, "", p)} print ""}' OFS=, <<<"$s"
,,,E,R,T,Y,,

Upvotes: 2

Regex and positioning items in .csv format

Answers (3)

Related Questions