Reputation: 3781
I want to print
userId = 1234
userid = 12345
timestamp = 88888888
js = abc
from my data
messssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
<input name="userId" value="1234" type="hidden"> messsssssssssssssssssss
<input name="userid" value="12345" type="hidden"> messssssssssssssssssss
<input name="timestamp" value="88888888" type="hidden"> messssssssssssss
<input name="js" value="abc" type="hidden"> messssssssssssssssssssssssss
messssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
How can I do this with AWK(or whatever)? Assume that my data is stored in the "$info
" variable (single line data).
Edit : single line data i mean all data represent like this
messss...<input name="userId" value="1234" type="hidden">messsss...<input ....>messssssss
So i can't use grep to extract interest section.
Upvotes: 1
Views: 1783
Reputation: 1
Here is a short awk oneliner using bash :
awk 'BEGIN{ FS="\""; RS="<";}/\=/{print $2," = ", $4;}' <(printf "%s" ${info})
Explanation :
RS="<" -- break the text into records (-lines)
FS="\"" -- break records into fields by "
/\=/ -- choose lines containing =
{print $2," = ", $4;} -- print 2nd and 4th field separated with spaces and =
Upvotes: 0
Reputation: 4797
Tools like awk and sed can be used together with XMLStarlet and HTML Tidy to parse HTML.
Upvotes: 0
Reputation: 89171
AWK:
BEGIN {
# Use record separator "<", instead of "\n".
RS = "<"
first = 1
}
# Skip the first record, as that begins before the first tag
first {
first = 0
next
}
/^input[^>]*>/ { #/
# make sure we don't match outside of the tag
end = match($0,/>/)
# locate the name attribute
pos = match($0,/name="[^"]*"/)
if (pos == 0 || pos > end) { next }
name = substr($0,RSTART+6,RLENGTH-7)
# locate the value attribute
pos = match($0,/value="[^"]*"/)
if (pos == 0 || pos > end) { next }
value = substr($0,RSTART+7,RLENGTH-8)
# print out the result
print name " = " value
}
Upvotes: 0
Reputation: 881103
I'm not sure I understand your "single line data" comment but if this is in a file, you can just do something like:
cat file
| grep '^<input '
| sed 's/^<input name="//'
| sed 's/" value="/ = /'
| sed 's/".*$//'
Here's the cut'n'paste version:
cat file | grep '^<input ' | sed 's/^<input name="//' | sed 's/" value="/ = /' | sed 's/".*$//'
This turns:
messssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
<input name="userId" value="1234" type="hidden"> messsssssssssssssssssss
<input name="userid" value="12345" type="hidden"> messssssssssssssssssss
<input name="timestamp" value="88888888" type="hidden"> messssssssssssss
<input name="js" value="abc" type="hidden"> messssssssssssssssssssssssss
messssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
quite happily into:
userId = 1234
userid = 12345
timestamp = 88888888
js = abc
The grep
simply extracts the lines you want while the sed
commandsrespectively:
Upvotes: 4
Reputation: 246744
IMO, parsing HTML should be done with a proper HTML/XML parser. For example, Ruby has an excellent package, Nokogiri, for parsing HTML/XML:
ruby -e '
require "rubygems"
require "nokogiri"
doc = Nokogiri::HTML.parse(ARGF.read)
doc.search("//input").each do |node|
atts = node.attributes
puts "%s = %s" % [atts["name"], atts["value"]]
end
' mess.html
produces the output you're after
Upvotes: 1
Reputation: 21
using perl
cat file | perl -ne 'print($1 . "=" . $2 . "\n") if(/name="(.*?)".*value="(.*?)"/);'
Upvotes: 2
Reputation: 496732
This part should probably be a comment on Pax's answer, but it got a bit long for that little box. I'm thinking 'single line data' means you don't have any newlines in your variable at all? Then this will work:
echo "$info" | sed -n -r '/<input/s/<input +name="([^"]+)" +value="([^"]+)"[^>]*>[^<]*/\1 = \2\n/gp'
Notes on interesting bits:
- -n
means don't print by default - we'll say when to print with that p
at the end.
-r
means extended regex
/<input/
at the beginning makes sure we don't even bother to work on lines that don't contain the desired pattern
That \n
at the end is there to ensure all records end up on separate lines - any original newlines will still be there, and the fastest way to get rid of them is to tack on a '| grep .' on the end - you could use some sed magic but you wouldn't be able to understand it thirty seconds after you typed it in.
I can think of ways to do this in awk, but this is really a job for sed (or perl!).
Upvotes: 3
Reputation: 75704
To process variables that contain more than one line, you need to put the variable name in double quotes:
echo "$info"|sed 's/^\(<input\( \)name\(=\)"\([^"]*\)" value="\([^"]*\)"\)\?.*/\4\2\3\2\5/'
Upvotes: 2