Reputation: 43
I want to add a delimiter in some indexes for each line of a file.
I have a file with data:
10100100010000
20200200020000
And I know the offset of each column (2, 5 and 9)
With this sed command: sed 's/\(.\{2\}\)/&,/;s/\(.\{6\}\)/&,/;s/\(.\{11\}\)/&,/' myFile
I get the expected output:
10,100,1000,10000
20,200,2000,20000
but with a large number of columns (~200) and rows (300k) is really slow.
Is there an efficient alternative?
Upvotes: 4
Views: 161
Reputation: 133528
1st solution: With GNU awk
could you please try following:
awk -v OFS="," '{$1=$1}1' FIELDWIDTHS="2 3 4 5" Input_file
2nd Solution: Using sed
try following.
sed 's/\(..\)\(...\)\(....\)\(.....\)/\1,\2,\3,\4/' Input_file
3rd solution: awk
solution using substr
.
awk 'BEGIN{OFS=","} {print substr($0,1,2) OFS substr($0,3,3) OFS substr($0,6,4) OFS substr($0,10,5)}' Input_file
In above substr
solution, I have taken 5 digits/characters in substr($0,10,5)
in case you want to take all characters/digits etc starting from 10th position use substr($0,10)
which will take rest of all line's characters/digits here to print.
Output will be as follows.
10,100,1000,10000
20,200,2000,20000
Upvotes: 8
Reputation: 52152
If you start the substitutions from the back, you can use the number flag to s
to specify which occurrence of any character you'd like to append a comma to:
$ sed 's/./&,/9;s/./&,/5;s/./&,/2' myFile
10,100,1000,10000
20,200,2000,20000
You could automate that a bit further by building the command with a printf
statement:
printf -v cmd 's/./&,/%d;' 9 5 2
sed "$cmd" myFile
or even wrap that in a little shell function so we don't have to care about listing the columns in reverse order:
gencmd() {
local arr
# Sort arguments in descending order
IFS=$'\n' arr=($(sort -nr <<< "$*"))
printf 's/./&,/%d;' "${arr[@]}"
}
sed "$(gencmd 2 5 9)" myFile
Upvotes: 1
Reputation: 203645
With GNU awk for FIELDWIDTHS:
$ awk -v FIELDWIDTHS='2 3 4 *' -v OFS=',' '{$1=$1}1' file
10,100,1000,10000
20,200,2000,20000
You'll need a newer version of gawk for *
at the end of FIELDWIDTHS to mean "whatever's left", with older version just choose a large number like 999
.
Upvotes: 1
Reputation: 24812
Modifying your sed command to make it add all the separators in one shot would likely make it perform better :
sed 's/^\(.\{2\}\)\(.\{3\}\)\(.\{4\}\)/\1,\2,\3,/' myFile
Or with extended regular expression:
sed -E 's/(.{2})(.{3})(.{4})/\1,\2,\3,/' myFile
Output:
10,100,1000,10000
20,200,2000,20000
Upvotes: 5