SHEEN
SHEEN

Reputation: 41

sed / awk complex line replacement

I want to replace thousands lines like this, but I'm having a hard time trying to make it work, also I have 2 variables $time and $date condition, to not make it global.:

Example: <!-- 2020-07-06 16:45:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row>

To replace: <!-- 2020-07-06 16:45:00 WEST / 1594050300 --> <row><v>NaN</v></row>

I tried with sed:

sed -i '<!-- 2020-07-06 16:45:00 WEST \/ 1594050300 --> <row><v>5.0000000000e+00<\/v><\/row>.*/<!-- 2020-07-06 16:45:00 WEST \/ 1594050300 --> <row><v>NaN<\/v><\/row>/' dump_teste.xml

sed: -e expression #1, char 1: unknown command: `<'

Also with awk:

awk '{gsub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1' tmp.txt
    awk: cmd. line:1: {gsub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1
    awk: cmd. line:1:                                                     ^ syntax error
    awk: cmd. line:1: {gsub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1
    awk: cmd. line:1:                                                                               ^ syntax error
    awk: cmd. line:1: {gsub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1
    awk: cmd. line:1:                                                                                                                                         ^ syntax error
    awk: cmd. line:1: {gsub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1
    awk: cmd. line:1:                                                                                                                                                      ^ syntax error
    awk: cmd. line:1: {gsub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1
    awk: cmd. line:1:                                                                                                                                                                ^ unterminated string
    awk: cmd. line:1: {gsub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1
    awk: cmd. line:1:                                                                                                                                                                ^ syntax error

or

awk '{sub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1' tmp.txt
awk: cmd. line:1: {sub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1
awk: cmd. line:1:                                                    ^ syntax error
awk: cmd. line:1: {sub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1
awk: cmd. line:1:                                                                              ^ syntax error
awk: cmd. line:1: {sub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1
awk: cmd. line:1:                                                                                                                                        ^ syntax error
awk: cmd. line:1: {sub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1
awk: cmd. line:1:                                                                                                                                                     ^ syntax error
awk: cmd. line:1: {sub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1
awk: cmd. line:1:                                                                                                                                                               ^ unterminated string
awk: cmd. line:1: {sub(/<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>1.9933333333e+00</v></row>/,"<!-- 2020-07-08 12:00:00 WEST / 1594206000 --> <row><v>NaN</v></row>")}1
awk: cmd. line:1:                                                                                                                                                               ^ syntax error

Upvotes: 0

Views: 87

Answers (2)

user12859859
user12859859

Reputation:

As per your need below is a command to replace number with NAN in file considering all lines that fall in time range irrespective of order in which lines appear.

set date from and till variables and then below command

while IFS= read -r in; do out="$(echo "$in" | awk '{print $2}')" && outtime="$(echo "$in" | awk '{print $3}')" && sed -i "/"$out" "$outtime"/ s/<v>.*<\/v>/<v>NAN<\/v>/" dumpteste.xml; done <<< "$(sort -k3 -k4 -k5 dumpteste.xml | awk -v date="$date" -v from="$from" -v till="$till" '$2 == date && $3 >= from && $3 <= till' | tac)"

Example of above command

cat dumpteste.xml         #original file
<!-- 2020-07-06 16:45:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row>
<!-- 2020-07-06 16:47:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row>
<!-- 2020-07-06 17:47:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row>
<!-- 2020-07-06 16:45:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row>
<!-- 2020-07-06 16:48:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row>
<!-- 2020-07-06 17:45:00 WEST / 1594050300 --><row<v>5.0000000000e+00</v></row>
<!-- 2020-08-06 16:45:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row>



date=2020-07-06
from=16:45:00
till=17:45:00
 Output  
cat dumpteste.xml      #after change

<!-- 2020-07-06 16:45:00 WEST / 1594050300 --> <row><v>NAN</v></row>
<!-- 2020-07-06 16:47:00 WEST / 1594050300 --> <row><v>NAN</v></row>
<!-- 2020-07-06 17:47:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row>
<!-- 2020-07-06 16:45:00 WEST / 1594050300 --> <row><v>NAN</v></row>
<!-- 2020-07-06 16:48:00 WEST / 1594050300 --> <row><v>NAN</v></row>
<!-- 2020-07-06 17:45:00 WEST / 1594050300 --> <row><v>NAN</v></row>
<!-- 2020-08-06 16:45:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row>

See for dates 2020-07-06, when time range 16:45:00-17:45:00 is provided lines with time 16:45,16:48,16:47,17:45 are changed. For time 16:45 but date 2020-08-06 it did not changed as date not matches.

Also if you need to enter date in range then define four variables: date, enddate, from, till. And execute below command

date=2020-07-06
enddate=2020-08-06
from=16:45:00
till=17:45:00
while IFS= read -r in; do out="$(echo "$in" | awk '{print $2}')" && outtime="$(echo "$in" | awk '{print $3}')" && sed -i "/"$out" "$outtime"/ s/<v>.*<\/v>/<v>NAN<\/v>/" du*; done <<< "$(sort -k3 -k4 -k5 du* | awk -v date="$date" -v from="$from" -v till="$till" -v enddate="$enddate" '$2 >= date && $2 <= enddate && $3 >= from && $3 <= till' | tac)"

Above command will help you in your task of changing the values provided with date and time in range Hope this is enough?

Shorter Version: 1). With time range

date=2020-07-06 && from=16:45:00 && till=17:45:00 && gawk -i inplace -v date="$date" -v from="$from" -v till="$till" '$2 == date && $3 >= from && $3 <= till {gsub(/<v>[^<]*/, "<v>nan<")}1' dumpteste.xml

2). With both date and time range

date=2020-07-06 && from=16:45:00 && till=17:45:00 && enddate=2020-08-06 && awk -v date="$date" -v from="$from" -v till="$till" -v enddate="$enddate" '$2 >= date && $2 <= enddate && $3 >= from && $3 <= till {gsub(/<v>[^<]*/, "<v>nan<")}1' dumpteste.xml

Upvotes: 0

user12859859
user12859859

Reputation:

The command you are trying is not having s option thats why it gives error.

sed -i 's/<!-- 2020-07-06 16:45:00 WEST \/ 1594050300 --> <row><v>5.0000000000e+00<\/v><\/row>.*/<!-- 2020-07-06 16:45:00 WEST \/ 1594050300 --> <row><v>NaN<\/v><\/row>/g' dumpteste.xml

or

sed -i 's/<v>.*<\/v>/<v>NAN<\/v>/g' dumpteste.xml

You are having two variable $date and $time and want to match lines having those variables and then apply sed. Do following:

sed "/"$date" "$time" .*<\/row>/ s/<v>.*<\/v>/<v>NAN<\/v>/g" dumpteste.xml

In above command if line is

<!-- 2020-07-06 16:45:00 WEST / 1594050300 --> <row><v>5.0000000000e+00</v></row>```
And date and time variable are
date='2020-07-06' time='16:45:00' 
then only line containg that date and time will be edited by sed.


Did it solved your problem?

Upvotes: 2

Related Questions