Reputation: 7528
I am trying to figure out a proper way to parse a stream of data using perl. I have read through many of the examples, documentations and questions, but could not find how I could basically cut a "package" from the stream of data and process it. This is the situation: - stream of data coming from a certain IP to an IP and port - stream contains some gibberish and then something between and with the data in there being semicolon seperated
My attempts so far is to have a Socket listening on the port and process the $data var:
#!/usr/bin/perl
use IO::Socket::INET;
# auto-flush on socket
$| = 1;
# creating a listening socket
my $socket = new IO::Socket::INET (
LocalHost => '127.0.0.1',
LocalPort => '7070',
Proto => 'tcp',
Listen => 5,
Reuse => 1
);
die "cannot create socket $!\n" unless $socket;
print "server waiting for client connection on port 7070 \n";
while(1)
{
# waiting for a new client connection
my $client_socket = $socket->accept();
# get information about a newly connected client
my $client_address = $client_socket->peerhost();
my $client_port = $client_socket->peerport();
print "connection from $client_address:$client_port\n";
# read up to 1024 characters from the connected client
my $data = "";
$client_socket->recv($data, 1024);
print "received data: $data\n";
@data_array = split(/;/,$data);
foreach (@data_array) {
print "$_\n";
}
# write response data to the connected client
$data = "ok";
$client_socket->send($data);
# notify client that response has been sent
shutdown($client_socket, 1);
}
$socket->close();
This works but as far as I understand this will put the whole stream in up to the size and then process that.
My question: How can I identify the part I need (start-end), process that and then go on to the next?
Upvotes: 0
Views: 1248
Reputation: 7528
I solved this by using the original code and adding :
if ( $data=~/<START>>/) {
print "\nFound start\n";
$message.=$data;
while ($message !~/END/){
$client_socket->recv($data, $message_length);
$message.=$data;
print "\nStill reading\n";
};
print "\nFound end\n"; # but may contain (part of) next START
}
I still need to implement the part where I check if the chunk read has part of the next message, but I'll figure that out. Thank you for the help!
Upvotes: 0
Reputation: 385799
I've never understood why people use recv
to read from a stream socket.
Normally, the reading loop looks something like the following:
my $buf = '';
while (1) {
my $rv = sysread($socket, $buf, 64*1024, length($buf));
if (!defined($rv)) {
die("Can't read from socket: $!\n");
}
if (!$rv) {
die("Can't read from socket: Premature EOF\n") if length($buf);
last;
}
while (my $msg = defined(check_for_full_message_and_extract_it_from_buf($buf))) {
process_msg($msg);
}
}
(Keep in mind that sysread returns as soon as there is some data, even if there's less data than requested.)
For example, the inner loop for sentinel-terminated data would look like the following:
while ($buf =~ s/^(.*)\n//) {
process_msg("$1");
}
For example, the inner loop for length-prefixed blocks would look like the following:
while (1) {
last if length($buf) < 4;
my $len = unpack('N', $buf);
last if length($buf) < 4+$len;
substr($buf, 0, 4, '');
my $msg = substr($buf, 0, $len, '');
process_msg($msg);
}
If you're particular case, you'd remove any data from the start $buf
that you want to ignore until you get to the part in which you're interested, then you'd start extracting the items in which you are interested. This is vague, but I only have a vague description of the protocol with which to work.
Upvotes: 5