TCL-REGEX:: How to filter a line that appears multiple times in a text file using TCL regexp

Question

Input file (resultnew.txt):

www.maannews.net.

www.maannews.net.

 ################################################# 

attach2.mobile01.com.

www.google-analytics.

attach2.mobile01.com.

attach2.mobile01.com.

www.google-analytics.

attach2.mobile01.com.

attach2.mobile01.com.

attach2.mobile01.com.

attach2.mobile01.com.

attach2.mobile01.com.

www.google.com.

attach2.mobile01.com.

attach2.mobile01.com.

attach2.mobile01.com.

 ################################################# 

cdn-img.mocospace.com

cdn-img.mocospace.com

www.mocospace.com.

cdn-img.mocospace.com

cdn-img.mocospace.com

cdn-img.mocospace.com

www.mocospace.com.

cdn-img.mocospace.com

www.mocospace.com.

www.google-analytics.

www.google-analytics.

fonts.gstatic.com.

cdn-img.mocospace.com

cdn-img.mocospace.com

fonts.gstatic.com.

fonts.gstatic.com.

 #################################################

My TCL Script:

set a [open resultnew.txt r]
set b [open balu_output.txt w]


while {[gets $a a1] >=0} {
    if {[regexp {[a-zA-Z\.]} $a1]} {
    puts $b $a1
    }
}

My Requirement:

From the above text file, I want to remove the lines that appears multiple times and want to print only one time into a new file.
Point 1 should happen between each "#################" and "#################". Still "################# should appear in that text file".

Please help me with your ideas. Thanks in advance.

Thanks,

Balu P.

narendra · Accepted Answer

What I understand from your question is that you need distinct value between the comment line (i.e hashess...... ). Below is the solution which you are looking for ... basically in script array key is used to keep unique values and re-initializing the array whenever next divider line (i.e your hash comment line is seen )...

I printed values on the STDOUT you can redirect them to other file.

#!/usr/bin/tclsh
set a [open resultnew.txt r]

# set an array to keep the unique records
array set myarray {}

# For each line in the input
while {[gets $a a1] >= 0} {

    # Get rid of extra spaces
    set a1 [string trim $a1]

    # if divider line found then print it (i.e ####)
    if { [string match "#*" $a1] } {
      puts $a1
      # unset the array for next set of entries
      array unset myarray
    } else {
      # Ignore empty lines
      if {$a1 ne "" }  {
        # print only if doesnot exists in the array
        if { [info exists myarray($a1) ] } {
          set myarray($a1) 1 
        } else {
          puts $a1
          set myarray($a1) 1
        }
     }
  }
}

Output of the script using your input file

$tclsh main.tcl
www.maannews.net.
#################################################
attach2.mobile01.com.
www.google-analytics.
www.google.com.
#################################################
cdn-img.mocospace.com
www.mocospace.com.
www.google-analytics.
fonts.gstatic.com.
#################################################

TCL-REGEX:: How to filter a line that appears multiple times in a text file using TCL regexp

Answers (2)

Related Questions