discipulus
discipulus

Reputation: 2725

Replace text using a regex only iff it is a field

I have a comma delimited file that looks like the following

Start,Card,App,Description,Internet
2013-01-02 11:54:00,No Data,Off,Our system has  No Data,Off
2013-01-03 05:30:00,Visa,On,The monitoring system is Off,Off
2013-01-05 17:07:00,Master,Off,The system is On,On
2013-01-07 00:08:00,No Data,No Data,No Data,On
2013-01-09 04:40:00,Master,Off,System is Off,On

I want to replace No Data with NA, Off with 0, and On with 1 only iff these words are fields, i.e. that is they have comma or beginning of newline or end of line. The required output is

Start,Card,App,Description,Internet
2013-01-02 11:54:00,NA,0,Our system has  No Data,0
2013-01-03 05:30:00,Visa,1,The monitoring system is Off,0
2013-01-05 17:07:00,Master,0,The system is On,1
2013-01-07 00:08:00,NA,NA,NA,1
2013-01-09 04:40:00,Master,0,System is Off,1

I tried for example,

sed 's/,Off,/,0,/g' test.txt

will only replace if the number of fields is greater than one or less than maximum number of fields. I also tried using look-arounds and leaving out header like

sed '2,$s/(?<=^|,)Off(?=$|,)/0/g' test.txt

I think this second command should work but didn't. Where did I go wrong? Thanks for the tips. Since the file is biggg, absolute indexing for columns like $4 is not an option.

Upvotes: 0

Views: 74

Answers (4)

potong
potong

Reputation: 58478

This might work for you (GNU sed):

sed -r ':a;s/(^|,)No Data(,|$)/\1NA\2/g;s/(^|,)Off(,|$)/\10\2/g;s/(^|,)On(,|$)/\11\2/g;ta' file

Use back references with alternation to replace required string and delimiters.

N.B. Delimiters either side of a required string must be tested twice hence the loop back to the place holder :a.

Upvotes: 1

Sundeep
Sundeep

Reputation: 23677

sed doesn't support lookarounds, perl to the rescue

$ perl -pe 'if($. > 1){ s/(^|,)\KOff(?=,|$)/0/g; s/(^|,)\KOn(?=,|$)/1/g; s/(^|,)\KNo Data(?=,|$)/NA/g; }' ip.txt
Start,Card,App,Description,Internet
2013-01-02 11:54:00,NA,0,Our system has  No Data,0
2013-01-03 05:30:00,Visa,1,The monitoring system is Off,0
2013-01-05 17:07:00,Master,0,The system is On,1
2013-01-07 00:08:00,NA,NA,NA,1
2013-01-09 04:40:00,Master,0,System is Off,1

Or, split input line on , and perform substitution on individual fields

$ perl -F, -lane 'if($. > 1){ foreach (@F){ s/^No Data$/NA/; s/^Off$/0/; s/^On$/1/ } } print join ",",@F' ip.txt 
Start,Card,App,Description,Internet
2013-01-02 11:54:00,NA,0,Our system has  No Data,0
2013-01-03 05:30:00,Visa,1,The monitoring system is Off,0
2013-01-05 17:07:00,Master,0,The system is On,1
2013-01-07 00:08:00,NA,NA,NA,1
2013-01-09 04:40:00,Master,0,System is Off,1


with awk

$ awk -v FS=, -v OFS=, 'NR>1{ for(i=1; i<=NF; i++){ sub(/^No Data$/,"NA",$i); sub(/^Off$/,"0",$i); sub(/^On$/,"1",$i) } } 1' ip.txt 
Start,Card,App,Description,Internet
2013-01-02 11:54:00,NA,0,Our system has  No Data,0
2013-01-03 05:30:00,Visa,1,The monitoring system is Off,0
2013-01-05 17:07:00,Master,0,The system is On,1
2013-01-07 00:08:00,NA,NA,NA,1
2013-01-09 04:40:00,Master,0,System is Off,1

Upvotes: 2

zxy
zxy

Reputation: 158

awk -F, '{for(i=1;i<=NF;i++) if(split($i,a,"Off|On|No Data| +")==2) {sub(/No Data/,"NA",$i);sub(/Off/,"0",$i);sub(/On/,"1",$i)}}1' OFS="," file

Upvotes: 1

James Brown
James Brown

Reputation: 37424

In awk using ternary operators (c ? "a" : "b"):

$ awk -F, '{for(i=NF;i>0;i--) $i=($i=="No Data"?"NA":($i=="Off"?"0":($i=="On"?"1":$i)))} 1' OFS=, file
Start,Card,App,Description,Internet
2013-01-02 11:54:00,NA,0,Our system has  No Data,0
2013-01-03 05:30:00,Visa,1,The monitoring system is Off,0
2013-01-05 17:07:00,Master,0,The system is On,1
2013-01-07 00:08:00,NA,NA,NA,1
2013-01-09 04:40:00,Master,0,System is Off,1

Upvotes: 2

Related Questions