onlyf
onlyf

Reputation: 883

sed - Replace comma after first regex match

i m trying to perform the following substitution on lines of the general format:

BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......

as you see the problem is that its a comma separated file, with a specific field containing a comma decimal. I would like to replace that with a dot .

I ve tried this, to replace the first occurence of a pattern after match, but to no avail, could someone help me?

sed -e '/,"/!b' -e "s/,/./"
sed -e '/"/!b' -e ':a' -e "s/,/\./"

Thanks in advance. An awk or perl solution would help me as well. Here's an awk effort:

gawk -F "," 'substr($10, 0, 3)==3 && length($10)==12 { gsub(/,/,".", $10); print}'

That yielded the same file unchanged.

Upvotes: 0

Views: 1330

Answers (6)

Ed Morton
Ed Morton

Reputation: 203995

If your input always follows that format of only one quoted field containing 1 comma then all you need is:

$ sed 's/\([^"]*"[^"]*\),/\1./' file
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109.07",DF,CCCCCCCCCCC, .......

If it's more complicated than that then see What's the most robust way to efficiently parse CSV using awk?.

Upvotes: 1

potong
potong

Reputation: 58473

This might work for you (GNU sed):

sed -E ':a;s/^([^"]*("[^",]*"[^"]*)*"[^",]*),/\1./;ta' file

This regexp matches a , within a pair of "'s and replaces it by a .. The regexp is anchored to the start of the line and thus needs to be repeated until no further matches can be matched, hence the :a and the ta commands which causes the substitution to be iterated over whilst any substitution is successful.

N.B. The solution expects that all double quotes are matched and that no double quotes are quoted i.e. \" does not appear in a line.

Upvotes: 1

kvantour
kvantour

Reputation: 26491

The following sed will convert all decimal separators in quoted numeric fields:

 sed 's/"\([-+]\?[0-9]*\)[,]\?\([0-9]\+\([eE][-+]\?[0-9]+\)\?\)"/"\1.\2"/g'

See: https://www.regular-expressions.info/floatingpoint.html

Upvotes: 1

Borislav Markov
Borislav Markov

Reputation: 1735

In order to use regexp as in perl you have to activate extended regular expression with -r. So if you want to replace all numbers and omit the " sign, then you can use this:

echo 'BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......'|sed -r 's/\"([0-9]+)\,([0-9]+)\"/\1\.\2/g'

If you want to replace first occurrence only you can use that:

echo 'BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......'|sed -r 's/\"([0-9]+)\,([0-9]+)\"/\1\.\2/1'

https://www.gnu.org/software/sed/manual/sed.txt

Upvotes: 0

Skeptic Biped
Skeptic Biped

Reputation: 92

Assuming you have this: BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC

Try this: awk -F',' '{print $1,$2,$3,$4"."$5,$6,$7}' filename | awk '$1=$1' FS=" " OFS=","

Output will be: BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109.07",DF,CCCCCCCCCCC

You simply need to know the field numbers for replacing the field separator between them.

Upvotes: 0

Inian
Inian

Reputation: 85780

CSV files should be parsed in awk with a proper FPAT variable that defines what constitutes a valid field in such a file. Once you do that, you can just iterate over the fields to do the substitution you need

gawk 'BEGIN { FPAT = "([^,]+)|(\"[^\"]+\")"; OFS="," } 
       { for(i=1; i<=NF;i++) if ($i ~ /[,]/) gsub(/[,]/,".",$i);}1' file

See this answer of mine to understand how to define and parse CSV file content with FPAT variable. Also see Save modifications in place with awk to do in-place file modifications like sed -i''.

Upvotes: 3

Related Questions