Circo
Circo

Reputation: 43

Add delimiters at specific indexes

I want to add a delimiter in some indexes for each line of a file.

I have a file with data:

10100100010000
20200200020000

And I know the offset of each column (2, 5 and 9)

With this sed command: sed 's/\(.\{2\}\)/&,/;s/\(.\{6\}\)/&,/;s/\(.\{11\}\)/&,/' myFile

I get the expected output:

10,100,1000,10000 
20,200,2000,20000

but with a large number of columns (~200) and rows (300k) is really slow.

Is there an efficient alternative?

Upvotes: 4

Views: 161

Answers (4)

RavinderSingh13
RavinderSingh13

Reputation: 133528

1st solution: With GNU awk could you please try following:

awk -v OFS="," '{$1=$1}1' FIELDWIDTHS="2 3 4 5"  Input_file

2nd Solution: Using sed try following.

sed 's/\(..\)\(...\)\(....\)\(.....\)/\1,\2,\3,\4/' Input_file

3rd solution: awk solution using substr.

awk 'BEGIN{OFS=","} {print substr($0,1,2) OFS substr($0,3,3) OFS substr($0,6,4) OFS substr($0,10,5)}' Input_file

In above substr solution, I have taken 5 digits/characters in substr($0,10,5) in case you want to take all characters/digits etc starting from 10th position use substr($0,10) which will take rest of all line's characters/digits here to print.

Output will be as follows.

10,100,1000,10000
20,200,2000,20000

Upvotes: 8

Benjamin W.
Benjamin W.

Reputation: 52152

If you start the substitutions from the back, you can use the number flag to s to specify which occurrence of any character you'd like to append a comma to:

$ sed 's/./&,/9;s/./&,/5;s/./&,/2' myFile
10,100,1000,10000
20,200,2000,20000

You could automate that a bit further by building the command with a printf statement:

printf -v cmd 's/./&,/%d;' 9 5 2
sed "$cmd" myFile

or even wrap that in a little shell function so we don't have to care about listing the columns in reverse order:

gencmd() {
    local arr
    # Sort arguments in descending order
    IFS=$'\n' arr=($(sort -nr <<< "$*"))
    printf 's/./&,/%d;' "${arr[@]}"
}

sed "$(gencmd 2 5 9)" myFile

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203645

With GNU awk for FIELDWIDTHS:

$ awk -v FIELDWIDTHS='2 3 4 *' -v OFS=',' '{$1=$1}1' file
10,100,1000,10000
20,200,2000,20000

You'll need a newer version of gawk for * at the end of FIELDWIDTHS to mean "whatever's left", with older version just choose a large number like 999.

Upvotes: 1

Aaron
Aaron

Reputation: 24812

Modifying your sed command to make it add all the separators in one shot would likely make it perform better :

sed 's/^\(.\{2\}\)\(.\{3\}\)\(.\{4\}\)/\1,\2,\3,/' myFile

Or with extended regular expression:

sed -E 's/(.{2})(.{3})(.{4})/\1,\2,\3,/' myFile

Output:

10,100,1000,10000
20,200,2000,20000

Upvotes: 5

Related Questions