Reputation: 1564
Input file (resultnew.txt):
www.maannews.net.
www.maannews.net.
#################################################
attach2.mobile01.com.
www.google-analytics.
attach2.mobile01.com.
attach2.mobile01.com.
www.google-analytics.
attach2.mobile01.com.
attach2.mobile01.com.
attach2.mobile01.com.
attach2.mobile01.com.
attach2.mobile01.com.
www.google.com.
attach2.mobile01.com.
attach2.mobile01.com.
attach2.mobile01.com.
#################################################
cdn-img.mocospace.com
cdn-img.mocospace.com
www.mocospace.com.
cdn-img.mocospace.com
cdn-img.mocospace.com
cdn-img.mocospace.com
www.mocospace.com.
cdn-img.mocospace.com
www.mocospace.com.
www.google-analytics.
www.google-analytics.
fonts.gstatic.com.
cdn-img.mocospace.com
cdn-img.mocospace.com
fonts.gstatic.com.
fonts.gstatic.com.
#################################################
My TCL Script:
set a [open resultnew.txt r]
set b [open balu_output.txt w]
while {[gets $a a1] >=0} {
if {[regexp {[a-zA-Z\.]} $a1]} {
puts $b $a1
}
}
My Requirement:
Please help me with your ideas. Thanks in advance.
Thanks,
Balu P.
Upvotes: 1
Views: 1379
Reputation: 1278
What I understand from your question is that you need distinct value between the comment line (i.e hashess...... ). Below is the solution which you are looking for ... basically in script array key is used to keep unique values and re-initializing the array whenever next divider line (i.e your hash comment line is seen )...
I printed values on the STDOUT you can redirect them to other file.
#!/usr/bin/tclsh
set a [open resultnew.txt r]
# set an array to keep the unique records
array set myarray {}
# For each line in the input
while {[gets $a a1] >= 0} {
# Get rid of extra spaces
set a1 [string trim $a1]
# if divider line found then print it (i.e ####)
if { [string match "#*" $a1] } {
puts $a1
# unset the array for next set of entries
array unset myarray
} else {
# Ignore empty lines
if {$a1 ne "" } {
# print only if doesnot exists in the array
if { [info exists myarray($a1) ] } {
set myarray($a1) 1
} else {
puts $a1
set myarray($a1) 1
}
}
}
}
Output of the script using your input file
$tclsh main.tcl
www.maannews.net.
#################################################
attach2.mobile01.com.
www.google-analytics.
www.google.com.
#################################################
cdn-img.mocospace.com
www.mocospace.com.
www.google-analytics.
fonts.gstatic.com.
#################################################
Upvotes: 1
Reputation: 137787
You need a different way to check whether to ignore lines, and an array is great for doing the uniqueness check. Here's an annotated version:
# For each line in the input
while {[gets $a a1] >= 0} {
# Get rid of extra spaces
set a1 [string trim $a1]
# Ignore empty and comment lines; [string match] is great for this!
if {$a1 eq "" || [string match "#*" $a1]} {
continue
}
# See if this is the first time we've seen a line
if {[incr occurrences($a1)] == 1} {
# It is! Print it now
puts $b $a1
}
}
If you have a horribly large file you might eventually run into problems with memory usage. But for files with (up to) just a few million lines, you should be fine.
Upvotes: 1