Reputation: 11

Python byte string data stream

I am working in Python(3) on a Raspberry Pi 3. My application is data acquisition and logging. My question has to do with properly splitting and interpreting serial data received as byte strings:

Raw data continuously accumulated in and read from an input buffer over two or more non-queried (USB) serial ports are received in the following general form in my application. The string tends to be much longer, but the following should be sufficient for an example:

b'+00000\r\n-00210\r\n+00360\r\n+00300\r\n-00163\r\n+00399\r\n'

I am using serial.read(serial.in_waiting) to receive the data. I find this to be the best method, as my independent data sources are asynchronous, continuously spitting-out their values (approximately 50 samples per second) and have slightly differing data-broadcast rates (just enough to be a nuisance).

I have found that I cannot reliably use "readline()" to simplify my task for a few reasons, including - and please do comment on this point if you have any insights on this - the fact that, for whatever reason, the "in_waiting" value is not reset to zero on my system after a "readline()".

Unfortunately, the "in_waiting" approach often produces data, not so neatly terminated as shown above. Possible variants, representing what might make it over the serial port on a given read attempt include:

b'+00000\r\n-00210\r\n+00360\r\n+00300\r\n-00163\r\n+003'

b'+00000\r\n-00210\r\n+00360\r\n+00300\r\n-00163\r'

b'+00000\r\n-00210\r\n+00360\r\n+00300\r\n-00163'

b'+00000\r\n-00210\r\n+00360\r\n+00300\r\n-'

That is, not all terminal data are fully-formed when they are read.

I have been attempting to use "decode" and "split" and "list" and "map" functions to interpret all available, complete data (those data that are properly terminated with \r\n) in order that I may do additional work with the numerical values on the fly. All the while, my intent is to retain any partial data that may have been received on the end of the read, so that they can be appended on the next read cycle.

My attempts have not met with success for all cases above, and that is why I am appealing to members more familiar with the Python programming language than I am for guidance.

Please kindly consider commenting on what would be the most efficient way (in Python) to:

1. Get from data such as I show above into a list of integer values.

2. Exclude incomplete, trailing data from the conversion, if it is not properly terminated.

3. Retain any incomplete trailing data for appending on subsequent read.

If you have dealt with a circumstance like mine in the past, I am hoping to learn from your experience, as I continue exploring the matter on my own.

Upvotes: 0

Answers (2)

sevensegment

Reputation: 11

I will preface this "answer" to my own question with two things: (1) I am not 100% sure it's considered proper etiquette for me to do so. If not, I ask others' pardon. And (2), the solution I will post is still formative. That is, it works, but I would really like to grasp a better, more concise and/or efficient method than my own.

That said, here's what I have done, stripped-down to the germane details and just a little contextual fluff:

For the benefit of the visitor to this little topic, I am reading data from a plurality of serial devices (9600, 8N1 - not important) in chunks having identical lengths - at least when they are complete, properly terminated. The OP contains representative samples of the format.

To read and store the data, I am currently using :

#ACQUIRE MOST-RECENT DATA
MRD1 = ser1.read(ser1.in_waiting)

#LOG RAW DATA IMMDEDIATELY
fil1.write(MRD1)

And, recall, originally I had assumed that reading full "lines" rather than strings of bytes would have been the preferred method, in order to avoid split entries, all the overhead, etc. But no. There are all sorts of things to smack-down that naive assumption. One of them is that "readline()" isn't clearing the "in_waiting" count in my case. No idea why. The other is that there seems to be no way to get the port or the connected device to report how many fully-terminated lines are in the queue. Is that even true? No idea. Just haven't found a way. So, bytes it is. Fine, I just want my doggone data, however I can get it, and to know that none of it will go AWOL.

So, I am doing the following with what I'm reading-in from the serial port to cope with all that I've murmured about:

###        CONDITION FIRST CHANNEL BYTE-STRING DATA FOR ANALYSIS

#Decode to asii 
MRDs1 = MRD1.decode('ascii')

#Add previous non-terminated 'orphan' data to the recent chunk of data
S1 = orphan1 + MRDs1

#Normalize non-numeric characters to * delimiter. What a mess...
SD1 = S1.replace("\r\n","*").replace("\r","").replace("\n","").split("*")

#Assess data stream for new non-terminated entry
orphan1 = ""
if len(SD1[len(SD1)-1]) == 0:
    del SD1[len(SD1)-1]
elif len(SD1[len(SD1)-1]) <6: #this '6' is only for my particular case.
    orphan1 = SD1.pop() #Is this the best way to knock-off orphan bytes?

#Check for meaningless/null leading data (might not even be necessary)
if len(SD1[0])<6: #Same drill as above, but on the front end.
    del SD1[0] #Is this really how it's best done in python?

#Extract numerical integer value list
ND1 = list(map(int,SD1))

All inside of a main loop, of course. From there, ND1 ("channel-1 new data" - I have up to four channels) gets shipped-off to a ring buffer for FFT (not what we're about here, but a great topic on its own).

So, what I'd really find instructive and what I'd surely appreciate, as will, I hope, other readers of this little topic, is if experienced community members can offer insights or suggestions in relation to what I've pasted-in above. Criticisms absolutely appreciated as well. Throw tomatoes, as long as they're spherical. I'm a physicist. Not a real coder. To be clear, yes, the approach is working now. It never misses a data point. Sure, but is it optimal? I have no idea. Probably not. Nothing I do is optimal. Is it as fast as possible with Python? I doubt it. Recall, I'm on a Raspberry Pi-3, so I am on a quest for efficiency. The same concern haunts my use of FFT (I'm using numpy rfft. Is that even good?) and my plot updates (Geez, my FFT plots in matplotlib seem slow.). At any rate, I think my Python skills are ripe for improvement. Thanks, if you can help me do that. In any event, I intend to post my/our complete code, for better or worse, at the conclusion of this thread, in the hopes of helping the next unbeliever get off of dead-center, where I endlessly dwell. I'll give it a few days...

Upvotes: 1

orangeInk

Reputation: 1410

[This is not an answer, but it was too long for a comment]

Take everything I say with a grain of salt, I'm no expert, I'm just throwing some ideas around!

Are all complete values the same lenght? In that case you could dump it all into a io.BytesIO streams (https://docs.python.org/3/library/io.html#binary-i-o) and grab the next n bytes (where n = len(complete_value))?

You say you've tried solving this with decode, split, list, and map. Could you explain what you did exactly? Because I don't see what's wrong with something like:

incomplete_value = b''
my_ints = []
while <there_is_data>:
    tmp = incomplete_value + get_my_data() # get a new batch of data and append it
    tmp_split = tmp.split(b'\r\n') # split the data on newlines
    complete_data = tmp_split[:-1] # get all but the last item
    incomplete_value = tmp_split[-1] # save the last item
    my_ints += [int.from_bytes(x) for x in complete_data] # creates a list of ints

Upvotes: 0

Python byte string data stream

Answers (2)

Related Questions