A.J. Hart
A.J. Hart

Reputation: 143

Removing multiple delimiters between outside delimiters on each line

Using awk or sed in a bash script, I need to remove comma separated delimiters that are located between an inner and outer delimiter. The problem is that wrong values ends up in the wrong columns, where only 3 columns are desired.

For example, I want to turn this:

2020/11/04,Test Account,569.00
2020/11/05,Test,Account,250.00
2020/11/05,More,Test,Accounts,225.00

Into this:

2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00

I've tried to use a few things, testing regex: But I cannot find a solution to only select the commas in order to remove.

regexr sample

Upvotes: 12

Views: 774

Answers (7)

glenn jackman
glenn jackman

Reputation: 246774

A similar answer to Timur's, in awk

awk '
    BEGIN { FS = OFS = "," }
    function join(start, stop, sep,     str, i) {
        str = $start
        for (i = start + 1; i <= stop; i++) {
            str = str sep $i
        }
        return str
    }
    { print $1, join(2, NF-1, " "), $NF }
' file.csv

It's a shame awk doesn't ship with a join function builtin

Upvotes: 1

Sundeep
Sundeep

Reputation: 23667

Yet another perl

$ perl -pe 's/(?:^[^,]*,|,[^,]*$)(*SKIP)(*F)|,/ /g' ip.txt
2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00
  • (?:^[^,]*,|,[^,]*$) matches first/last field along with the comma character
    • (*SKIP)(*F) this would prevent modification of preceding regexp
  • |, provide , as alternate regexp to be matched for modification

With sed (assuming \n is supported by the implementation, otherwise, you'll have to find a character that cannot be present in the input)

sed -E 's/,/\n/; s/,([^,]*)$/\n\1/; y/,/ /; y/\n/,/'
  • s/,/\n/; s/,([^,]*)$/\n\1/ replace first and last comma with newline character
  • y/,/ / replace all comma with space
  • y/\n/,/ change newlines back to comma

Upvotes: 6

user14473238
user14473238

Reputation:

perl -pe 's{,\K.*(?=,)}{$& =~ y/,/ /r}e' file
sed -e ':a' -e 's/\(,[^,]*\),\([^,]*,\)/\1 \2/; t a' file
awk '{$1=$1","; $NF=","$NF; gsub(/ *, */,","); print}' FS=, file
awk '{for (i=2; i<=NF; ++i) $i=(i>2 && i<NF ? " " : ",") $i} 1' FS=, OFS= file

Upvotes: 7

Timur Shtatland
Timur Shtatland

Reputation: 12347

Use this Perl one-liner:

perl -F',' -lane 'print join ",", $F[0], "@F[1 .. ($#F-1)]", $F[-1];' in.csv

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array @F on whitespace or on the regex specified in -F option.
-F',' : Split into @F on comma, rather than on whitespace.

$F[0] : first element of the array @F (= first comma-delimited value).
$F[-1] : last element of @F.
@F[1 .. ($#F-1)] : elements of @F between the second from the start and the second from the end, inclusive.
"@F[1 .. ($#F-1)]" : the above elements, joined on blanks into a string.
join ",", ... : join the LIST "..." on a comma, and return the resulting string.

SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

Upvotes: 7

Ed Morton
Ed Morton

Reputation: 203324

With GNU awk for the 3rd arg to match():

$ awk -v OFS=, '{
     match($0,/([^,]*),(.*),([^,]*)/,a)
     gsub(/,/," ",a[2])
     print a[1], a[2], a[3]
}' file
2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00

or with any awk:

$ awk '
    BEGIN { FS=OFS="," }
    {
        n = split($0,a)
        gsub(/^[^,]*,|,[^,]*$/,"")
        gsub(/,/," ")
        print a[1], $0, a[n]
    }
' file
2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00

Upvotes: 8

RavinderSingh13
RavinderSingh13

Reputation: 133458

awk doesn't support look arounds, we could have it by using match function of awk; using that could you please try following, written and tested with shown samples in GNU awk.

awk '
match($0,/,.*,/){
  val=substr($0,RSTART+1,RLENGTH-2)
  gsub(/,/," ",val)
  print substr($0,1,RSTART) val substr($0,RSTART+RLENGTH-1)
}
' Input_file

Upvotes: 6

Raman Sailopal
Raman Sailopal

Reputation: 12867

awk -F, '{ printf "%s,",$1;for (i=2;i<=NF-2;i++) { printf "%s ",$i };printf "%s,%s\n",$(NF-1),$NF }' file

Using awk, print the first comma delimited field and then loop through the rest of the field up to the last but 2 field printing the field followed by a space. Then for the last 2 fields print the last but one field, a comma and then the last field.

Upvotes: 9

Related Questions