mac_online
mac_online

Reputation: 370

Find and Replace Pipe delimiter from field in a pipe delimited file

I have had a similar question like this earlier later i've to add more scope to that question but had no idea how to edit it and make it live again. that's why i'm posting as a new Question.

My file is a pipe delimited file.

 NAME | NUM | WEB | LOCATION | CURRENCY | PLACE
 ABCD | 04  | GO|OGLE | EUROPE | EURO   | PARIS
 XYZE | 12  | Y|A|HOO | USA    | DOLLAR | SEATTLE
 LMNO | 17  | |FACE|B|O|O|K | ASIA | ASIAN DOLLAR | HONGKONG
 EDDE | 98  | A||M|AZ|ON| | AFRICA | AF DOLLAR | CAPETOWN

My file is as complicated as this is. Our need is to remove the "|" symbol from the WEB field and replace it with a junk value like #,$,& or anything.

The Output has to be:

NAME | NUM | WEB | LOCATION | CURRENCY | PLACE
ABCD | 04  | GO#OGLE | EUROPE | EURO   | PARIS
XYZE | 12  | Y#A#HOO | USA    | DOLLAR | SEATTLE
LMNO | 17  | #FACE#B#O#O#K | ASIA | ASIAN DOLLAR | HONGKONG
EDDE | 98  | A##M#AZ#ON# | AFRICA | AF DOLLAR | CAPETOWN

I've tried awk'ing few filters to clear this mess up.nothing seems to find a happy ending. Thank you! I would like to thank few names who answered my prev question : RomanPerekhrest, Ed Morton,shellter , val rog.

Upvotes: 1

Views: 2141

Answers (6)

Ed Morton
Ed Morton

Reputation: 203493

$ cat tst.awk
BEGIN { FS=OFS="|" }
NR==1 { outNf=NF; print; next }
{
    end = beg + (NF - outNf) - 1
    for (i=1; i<=NF; i++) {
        sep = (i>=beg && i<=end ? "#" : OFS)
        printf "%s%s", $i, (i<NF ? sep : ORS)
    }
}

$ awk -v beg=3 -f tst.awk file
 NAME | NUM | WEB | LOCATION | CURRENCY | PLACE
 ABCD | 04  | GO#OGLE | EUROPE | EURO   | PARIS
 XYZE | 12  | Y#A#HOO | USA    | DOLLAR | SEATTLE
 LMNO | 17  | #FACE#B#O#O#K | ASIA | ASIAN DOLLAR | HONGKONG
 EDDE | 98  | A##M#AZ#ON# | AFRICA | AF DOLLAR | CAPETOWN

How it works: On the first line the number of fields to be output is the same as the number of fields on that line so it saves that number as outNF. From then on any subsequent line with more than outNF fields has outNF-NF fields starting at beg to be combined. So inside the loop it uses OFS between fields from 1 to beg, then from beg+1 to beg+(outNF-NF) it uses # between fields to create one merged output field from the input fields in that range, then it goes back to using OFS between fields.

Upvotes: 2

Erik Bennett
Erik Bennett

Reputation: 1099

I didn't try to put this in one line, but rather made it a little easier to read. Those who play perl golf will be able to reduce it considerably. The idea is to anchor the first two fields and the last three.

#!/usr/bin/perl

while(<DATA>) {
  chomp;
  if(($name, $num, $web, $location, $currency, $place) = $_ =~
     /^([^\|]+)\|([^\|]+)\|(.+)\|([^\|]+)\|([^\|]+)\|([^\|]+)$/) {
    $web =~ tr/\|/\_/;
    printf "%s\n", join('|', ($name, $num, $web, $location, $currency, $place));
  }
}
__DATA__
 NAME | NUM | WEB | LOCATION | CURRENCY | PLACE
 ABCD | 04  | GO|OGLE | EUROPE | EURO   | PARIS
 XYZE | 12  | Y|A|HOO | USA    | DOLLAR | SEATTLE
 LMNO | 17  | |FACE|B|O|O|K | ASIA | ASIAN DOLLAR | HONGKONG
 EDDE | 98  | A||M|AZ|ON| | AFRICA | AF DOLLAR | CAPETOWN

Output:

 NAME | NUM | WEB | LOCATION | CURRENCY | PLACE
 ABCD | 04  | GO_OGLE | EUROPE | EURO   | PARIS
 XYZE | 12  | Y_A_HOO | USA    | DOLLAR | SEATTLE
 LMNO | 17  | _FACE_B_O_O_K | ASIA | ASIAN DOLLAR | HONGKONG
 EDDE | 98  | A__M_AZ_ON_ | AFRICA | AF DOLLAR | CAPETOWN

Upvotes: 0

Rahul Verma
Rahul Verma

Reputation: 3089

A simple awk solution :

awk  -F "|" '{printf $1} 
{for(i=2; i<=NF; i++) { if(i>3 && i<NF-2)printf "#"$i; else printf "|"$i } printf "\n"} ' file

NAME|NUM|WEB|LOCATION|CURRENCY|PLACE
ABCD|04|GO#OGLE|EUROPE|EURO|PARIS
XYZE|12|Y#A#HOO|USA|DOLLAR|SEATTLE
LMNO|17|#FACE#B#O#O#K|ASIA|ASIANDOLLAR|HONGKONG
EDDE|98|A##M#AZ#ON#|AFRICA|AFDOLLAR|CAPETOWN

if(i>3 && i<NF-2) : this condition is for extra unwanted fields after 3rd field and before NF-2nd field. If it satisfies, prefix "#" before printing these extra fields.

Upvotes: 1

sumitya
sumitya

Reputation: 2691

Another awk solution can be:-

awk  -F'[[:space:]][|][[:space:]]' '{gsub(/\|/,"#",$3);print $1,"|",$2,"|",$3,"|",$4,"|",$5,"|",$6}' file.txt

Explanation:-

-F - for field separator here it is space|space
gsub - global substitution in field 3. i.e. every occurance of | will be replaced by #. 
print - just print all the columns separated by "|"

output will be:-

NAME | NUM | WEB | LOCATION | CURRENCY | PLACE
ABCD | 04  | GO#OGLE | EUROPE | EURO   | PARIS
XYZE | 12  | Y#A#HOO | USA    | DOLLAR | SEATTLE
LMNO | 17  | #FACE#B#O#O#K | ASIA | ASIAN DOLLAR | HONGKONG
EDDE | 98  | A##M#AZ#ON# | AFRICA | AF DOLLAR | CAPETOWN

Upvotes: 1

Shakiba Moshiri
Shakiba Moshiri

Reputation: 23794

easy if you do not mind with Perl

If it has space; then we can print it by:

stackoverflow ❱ perl -F'\s+|\s+' -a -le  'print $F[5]' file
WEB
GO|OGLE
Y|A|HOO
|FACE|B|O|O|K
A||M|AZ|ON|
stackoverflow ❱  

Since we can modify the @F array in Perl; thus we can:

$F[5] =~ s/\|/#/g;  

It modifies only this column not others.

And eventually we can print it:

stackoverflow ❱ perl -F'\s+|\s+' -lae  '$F[5] =~ s/\|/#/g;print "@F"' file
 NAME | NUM | WEB | LOCATION | CURRENCY | PLACE
 ABCD | 04 | GO#OGLE | EUROPE | EURO | PARIS
 XYZE | 12 | Y#A#HOO | USA | DOLLAR | SEATTLE
 LMNO | 17 | #FACE#B#O#O#K | ASIA | ASIAN DOLLAR | HONGKONG
 EDDE | 98 | A##M#AZ#ON# | AFRICA | AF DOLLAR | CAPETOWN
stackoverflow ❱  

If your file has no space, as someone commented me; then you can spread others columns; modify only that one and join them all together:

stackoverflow ❱ cat file2
NAME|NUM|WEB|LOCATION|CURRENCY|PLACE
ABCD|04|GO|OGLE|EUROPE|EURO|PARIS
XYZE|12|Y|A|HOO|USA|DOLLAR|SEATTLE
LMNO|17||FACE|B|O|O|K|ASIA|ASIANDOLLAR|HONGKONG
EDDE|98|A||M|AZ|ON||AFRICA|AFDOLLAR|CAPETOWN
stackoverflow ❱ perl -F'\|' -le  '$s=$#F;$e="@F[2..$s-3]";$e=~s/ +/#/g;print join "|", @F[0..1],$e,join "|",@F[$s-2,$s-1,$s]' file2
NAME|NUM|WEB|LOCATION|CURRENCY|PLACE
ABCD|04|GO#OGLE|EUROPE|EURO|PARIS
XYZE|12|Y#A#HOO|USA|DOLLAR|SEATTLE
LMNO|17|#FACE#B#O#O#K|ASIA|ASIANDOLLAR|HONGKONG
EDDE|98|A#M#AZ#ON#|AFRICA|AFDOLLAR|CAPETOWN

Upvotes: 1

anubhava
anubhava

Reputation: 785128

You can use this awk command:

awk 'BEGIN{FS=OFS="|"} NR==1{n=NF} NF > n {
s=$3; for (i=4; i<=NF-3; i++) {s = s "#" $i; $i=""} $3=s; gsub(/\|{2,}/, "|")} 1' file

NAME | NUM | WEB | LOCATION | CURRENCY | PLACE
ABCD | 04  | GO#OGLE | EUROPE | EURO   | PARIS
XYZE | 12  | Y#A#HOO | USA    | DOLLAR | SEATTLE
LMNO | 17  | #FACE#B#O#O#K | ASIA | ASIAN DOLLAR | HONGKONG
EDDE | 98  | A##M#AZ#ON# | AFRICA | AF DOLLAR | CAPETOWN

Upvotes: 2

Related Questions