Reputation: 75
I need to replace all occurrences of a string after nth occurrence in every line of a Unix file.
My file data:
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
My output data:
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
tried using sed: sed 's/://3g' test.txt
Unfortunately, the g
option with the occurrence is not working as expected. instead, it is replacing all the occurrences.
Upvotes: 4
Views: 719
Reputation: 204638
With GNU awk for the 3rd arg to match() and gensub():
$ awk 'match($0,/(:[^:]+:)(.*)/,a){ $0=a[1] gensub(/:/,"","g",a[2]) } 1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
and with any awk in any shell on every Unix box:
$ awk 'match($0,/:[^:]+:/){ tgt=substr($0,1+RLENGTH); gsub(/:/,"",tgt); $0=substr($0,1,RLENGTH) tgt } 1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
Upvotes: 3
Reputation: 627536
You can use the perl
solution like
perl -pe 's~^(?:[^:]*:){2}(*SKIP)(?!)|:~~g if /^:account_id:/' test.txt
See the online demo and the regex demo.
The ^(?:[^:]*:){2}(*SKIP)(?!)|:
regex means:
^(?:[^:]*:){2}(*SKIP)(?!)
- match
^
- start of string (here, a line)(?:[^:]*:){2}
- two occurrences of any zero or more chars other than a :
and then a :
char(*SKIP)(?!)
- skip the match and go on to search for the next match from the failure position|
- or:
- match a :
char.And only run the replacement if the current line starts with :account_id:
(see if /^:account_id:/'
).
Or an awk
solution like
awk 'BEGIN{OFS=FS=":"} /^:account_id:/ {result="";for (i=1; i<=NF; ++i) { result = result (i > 2 ? $i : $i OFS)}; print result}' test.txt
See this online demo. Details:
BEGIN{OFS=FS=":"}
- sets the input/output field separator to :
/^:account_id:/
- line must start with :account_id:
result=""
- sets result
variable to an empty stringfor (i=1; i<=NF; ++i) { result = result (i > 2 ? $i : $i OFS)}; print result}
- iterates over the fields and if the field number is greater than 2
, just append the current field value to result
, else, append the value + output field separator; then print the result
.Upvotes: 3
Reputation: 58578
This might work for you (GNU sed):
sed 's/:/\n/3;h;s/://g;H;g;s/\n.*\n//' file
Replace the third occurrence of :
by a newline.
Make a copy of the line.
Delete all occurrences of :
's.
Append the amended line to the copy.
Join the two lines by removing everything from third occurrence of the copy to the third occurrence of the amended line.
N.B. The use of the newline is the best delimiter to use in the case of sed, as the line presented to seds commands are initially devoid of newlines. However the important property of the delimiter is that it is unique and therefore can be any such character as long as it is not found anywhere in the data set.
An alternative solution uses a loop to remove all :
's after the first two:
sed -E ':a;s/^(([^:]*:){2}[^:]*):/\1/;ta' file
Upvotes: 3
Reputation: 133770
With GNU awk
, using gensub
please try following. This is completely based on your shown samples, where OP wants to remove :
from 3rd occurrence onwards. Using gensub
to segregate parts of matched values and removing all colons from 2nd part(from 3rd colon onwards) in it as per OP's requirement.
awk -v regex="^([^:]*:)([^:]*:)(.*)" '
{
firstPart=restPart=""
firstPart=gensub(regex, "\\1 \\2", "1", $0)
restPart=gensub(regex,"\\3","1",$0)
gsub(/:/,"",restPart)
print firstPart restPart
}
' Input_file
Upvotes: 5
Reputation: 36838
I would use GNU AWK
following way if n fixed and equal 2 following way, let file.txt
content be
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
then
awk 'BEGIN{FS=":";OFS=""}{$2=FS $2 FS;print}' file.txt
output
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
Explanation: use :
as field separator and nothing as output field separator, this itself does remove all :
so I add :
which have to be preserved: 1st (before second column) and 2nd (after second column). Beware that I tested it solely for this data, so if you would want to use it you should firstly test it with more possible inputs.
(tested in gawk 4.2.1)
Upvotes: 3
Reputation: 53508
I have inferred based on the limited data you've given us, so it's possible this won't work. But I wouldn't use regex for this job. What you have there is colon delimited fields.
So I'd approach it using split
to extract the data, and then some form of string formatting to reassemble exactly what you like:
#!/usr/bin/perl
use strict;
use warnings;
while (<DATA>) {
chomp;
my ( undef, $first, @rest ) = split /:/;
print ":$first:", join ( "", @rest ),"\n";
}
__DATA__
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
This gives you the desired result, whilst IMO being considerably clearer for the next reader than a complicated regex.
Upvotes: 4
Reputation: 16997
Another approach using awk
awk -v c=':' -v n=2 'BEGIN{
FS=OFS=""
}
{
j=0;
for(i=0; ++i<=NF;)
if($i==c && j++>=n)$i=""
}1' file
$ cat file
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
$ awk -v c=':' -v n=2 'BEGIN{FS=OFS=""}{j=0;for(i=0; ++i<=NF;)if($i==c && j++>=n)$i=""}1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
Upvotes: 8