nag
nag

Reputation: 75

Replacing all occurrence after nth occurrence in a line in perl

I need to replace all occurrences of a string after nth occurrence in every line of a Unix file.

My file data:

:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus

My output data:

:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus

tried using sed: sed 's/://3g' test.txt

Unfortunately, the g option with the occurrence is not working as expected. instead, it is replacing all the occurrences.

Upvotes: 4

Views: 719

Answers (7)

Ed Morton
Ed Morton

Reputation: 204638

With GNU awk for the 3rd arg to match() and gensub():

$ awk 'match($0,/(:[^:]+:)(.*)/,a){ $0=a[1] gensub(/:/,"","g",a[2]) } 1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus

and with any awk in any shell on every Unix box:

$ awk 'match($0,/:[^:]+:/){ tgt=substr($0,1+RLENGTH); gsub(/:/,"",tgt); $0=substr($0,1,RLENGTH) tgt } 1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus

Upvotes: 3

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627536

You can use the perl solution like

perl -pe 's~^(?:[^:]*:){2}(*SKIP)(?!)|:~~g if /^:account_id:/' test.txt

See the online demo and the regex demo.

The ^(?:[^:]*:){2}(*SKIP)(?!)|: regex means:

  • ^(?:[^:]*:){2}(*SKIP)(?!) - match
    • ^ - start of string (here, a line)
    • (?:[^:]*:){2} - two occurrences of any zero or more chars other than a : and then a : char
    • (*SKIP)(?!) - skip the match and go on to search for the next match from the failure position
  • | - or
  • : - match a : char.

And only run the replacement if the current line starts with :account_id: (see if /^:account_id:/').

Or an awk solution like

awk 'BEGIN{OFS=FS=":"} /^:account_id:/ {result="";for (i=1; i<=NF; ++i) { result = result (i > 2 ? $i : $i OFS)}; print result}' test.txt

See this online demo. Details:

  • BEGIN{OFS=FS=":"} - sets the input/output field separator to :
  • /^:account_id:/ - line must start with :account_id:
  • result="" - sets result variable to an empty string
  • for (i=1; i<=NF; ++i) { result = result (i > 2 ? $i : $i OFS)}; print result} - iterates over the fields and if the field number is greater than 2, just append the current field value to result, else, append the value + output field separator; then print the result.

Upvotes: 3

potong
potong

Reputation: 58578

This might work for you (GNU sed):

sed 's/:/\n/3;h;s/://g;H;g;s/\n.*\n//' file

Replace the third occurrence of : by a newline.

Make a copy of the line.

Delete all occurrences of :'s.

Append the amended line to the copy.

Join the two lines by removing everything from third occurrence of the copy to the third occurrence of the amended line.

N.B. The use of the newline is the best delimiter to use in the case of sed, as the line presented to seds commands are initially devoid of newlines. However the important property of the delimiter is that it is unique and therefore can be any such character as long as it is not found anywhere in the data set.

An alternative solution uses a loop to remove all :'s after the first two:

sed -E ':a;s/^(([^:]*:){2}[^:]*):/\1/;ta' file

Upvotes: 3

RavinderSingh13
RavinderSingh13

Reputation: 133770

With GNU awk, using gensub please try following. This is completely based on your shown samples, where OP wants to remove : from 3rd occurrence onwards. Using gensub to segregate parts of matched values and removing all colons from 2nd part(from 3rd colon onwards) in it as per OP's requirement.

awk -v regex="^([^:]*:)([^:]*:)(.*)" '
{
  firstPart=restPart=""
  firstPart=gensub(regex, "\\1 \\2", "1", $0)
  restPart=gensub(regex,"\\3","1",$0)
  gsub(/:/,"",restPart)
  print firstPart restPart
}
' Input_file

Upvotes: 5

Daweo
Daweo

Reputation: 36838

I would use GNU AWK following way if n fixed and equal 2 following way, let file.txt content be

:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus

then

awk 'BEGIN{FS=":";OFS=""}{$2=FS $2 FS;print}' file.txt

output

:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus

Explanation: use : as field separator and nothing as output field separator, this itself does remove all : so I add : which have to be preserved: 1st (before second column) and 2nd (after second column). Beware that I tested it solely for this data, so if you would want to use it you should firstly test it with more possible inputs.

(tested in gawk 4.2.1)

Upvotes: 3

Sobrique
Sobrique

Reputation: 53508

I have inferred based on the limited data you've given us, so it's possible this won't work. But I wouldn't use regex for this job. What you have there is colon delimited fields.

So I'd approach it using split to extract the data, and then some form of string formatting to reassemble exactly what you like:

#!/usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
  chomp;
  my ( undef, $first, @rest ) = split /:/; 
  print ":$first:", join ( "", @rest ),"\n";
}

__DATA__
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus

This gives you the desired result, whilst IMO being considerably clearer for the next reader than a complicated regex.

Upvotes: 4

Akshay Hegde
Akshay Hegde

Reputation: 16997

Another approach using awk

awk -v c=':' -v n=2 'BEGIN{
                       FS=OFS=""
                     }
                     {
                       j=0;
                       for(i=0; ++i<=NF;)
                         if($i==c && j++>=n)$i=""
                     }1' file 
$ cat file 
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus

$ awk -v c=':' -v n=2 'BEGIN{FS=OFS=""}{j=0;for(i=0; ++i<=NF;)if($i==c && j++>=n)$i=""}1' file 
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus

Upvotes: 8

Related Questions