Reputation: 363
I wrote a python 3 script which tests an SPI link to an FPGA. It runs on an Raspberry Pi 3. The test works like this: after putting the FPGA in test mode (a push switch), send the first byte, which can be any value. Then further bytes are sent indefinitely. Each one increments by the first value sent, truncated to 8 bits. Thus, if the first value is 37, the FPGA expects the following sequence:
37, 74, 111, 148, 185, 222, 4, 41 ...
Some additional IO pins are used to signal between the devices - RUN (RPi output) starts the test (necessary because the FPGA times out in about 15ms if it expects a byte) and ERR (FPGA output) signals an error. Errors can thus be counted at both ends.
In addition, the RPi script writes a one line summary of bytes sent and number of erros every million bytes.
All of this works just fine. But after running for about 3 days, I get the following error on the RPi:
free(): invalid pointer: 0x00405340
I get this exact same error on two identical test setups, even the same memory address. The last report says "4294M bytes sent, 0 errors"
I seem to have proved the SPI link, but I am concerned that this long-running program crashes for no apparent reason.
Here is the important part of my test code:
def _report(self, msg):
now = datetime.datetime.now()
os.system("echo \"{} : {}\" > spitest_last.log".format(now, msg))
def spi_test(self):
global end_loop
input("Put the FPGA board into SPI test mode (SW1) and press any key")
self._set_run(True)
self.END_LOOP = False
print("SPI test is running, CTRL-C to end.")
# first byte is sent without LOAD, this is the seed
self._send_byte(self._val)
self._next_val()
end_loop = False
err_flag = False
err_cnt = 0
byte_count = 1
while not end_loop:
mb = byte_count % 1000000
if mb == 0:
msg = "{}M bytes sent, {} errors".format(int(byte_count/1000000), err_cnt)
print("\r" + msg, end="")
self._report(msg)
err_flag = True
else:
err_flag = False
#print("sending: {}".format(self._val))
self._set_load(True)
if self._errors and err_flag:
self._send_byte(self._val + 1)
else:
self._send_byte(self._val)
if self.is_error():
err_cnt += 1
msg = "{}M bytes sent, {} errors".format(int(byte_count/1000000), err_cnt)
print("\r{}".format(msg), end="")
self._report(msg)
self._set_load(False)
# increase the value by the seed and truncate to 8 bits
self._next_val()
byte_count += 1
# test is done
input("\nSPI test ended ({} bytes sent, {} errors). Press ENTER to end.".format(byte_count, err_cnt))
self._set_run(False)
(Note for clarification : there is a command line option to artifically create an error every million bytes. Hence the " err_flag" variable.)
I've tried using python3 in console mode, and there seems to be no issue with the size of the byte_count variable (there shouldn't be, according to what I have read about python integer size limits).
Anyone have an idea as to what might cause this?
Upvotes: 5
Views: 1512
Reputation: 10514
I belive the original asker's issue is a reference leak. Specifically py-spidev issue 91. Said reference leak has been fixed in the 3.5 release of spidev.
Python uses a shared pool of objects to represent small integer values*, rather than re-creating them each time. So when code leaks references to small numbers the result is not a memory leak but instead a reference count that keeps increasing. The python spidev library had an issue where it leaked references to small integers in this way.
On a 32-bit system** the eventual result is that the reference count overflows. Then something decrements the overflowed reference count and the reference counting system frees the object.
What I can't explain is the other answer that claims they can still reproduce the issue with 3.5. This issue was supposed to have been fixed in that version.
* Specifically numbers in the range -3 to 256 inclusive, so anything that can be represented in an unsigned byte plus a few negative values (presumably because they are commonly used as error returns) and 256 (presumably because it's often used as a multiplier).
** On a 64-bit system the reference count will not overflow within a human lifetime.
Upvotes: 0
Reputation: 49
This issue is connected to spidev versions older than 3.5 only. The comments below were done under assumption that I was using the upgraded version of spidev.
#############################################################################
I can confirm this problem. It is persistent with both RPi3B and RPi4B. Using python 3.7.3 at both RPi3 and RPi4. The version of spidev which I tried were 3.3, 3.4 and the latest 3.5. I was able to reproduce this error several times by simply looping through this single line.
spidevice2.xfer2([0x00, 0x00, 0x00, 0x00])
It takes up to 11 hours depending on the RPi version. After 1073014000 calls (rounded to 1000), the script crashes because of "invalid pointer". The total amount of bytes sent is the same as in danmcb's case. It seems as if 2^32 bytes represent a limit.
I tried different approaches. For example, calling close() from time to time followed by open(). This did not help.
Then, I tried to create the spiDev object locally, so it would re-created for every batch of data.
def spiLoop():
spidevice2 = spidev.SpiDev()
spidevice2.open(0, 1)
spidevice2.max_speed_hz = 15000000
spidevice2.mode = 1 # Data is clocked in on falling edge
for j in range(100000):
spidevice2.xfer2([0x00, 0x00, 0x00, 0x00])
spidevice2.close()
It still crashed at after approx. 2^30 calls of xfer2([0x00, 0x00, 0x00, 0x00]) which corresponds to approx. 2^32 bytes.
EDIT1
To speed up the process, I was sending in blocks of 4096 bytes using the code below. And I repeatedly created the SpiDev object locally. It took 2 hours to arrive at 2^32 bytes count.
def spiLoop():
spidevice2 = spidev.SpiDev()
spidevice2.open(0, 1)
spidevice2.max_speed_hz = 25000000
spidevice2.mode = 1 # Data is clocked in on falling edge
to_send = [0x00] * 2**12 # 4096 bytes
for j in range(100):
spidevice2.xfer2(to_send)
spidevice2.close()
del spidevice2
def runSPI():
for i in range(2**31 - 1):
spiLoop()
print((2**12 * 100 * (i + 1)) / 2**20, 'Mbytes')
EDIT2
Reloading the spidev on the fly does not help either. I tried this code on both RPi3 and RPi4 with the same result:
import importlib
def spiLoop():
importlib.reload(spidev)
spidevice2 = spidev.SpiDev()
spidevice2.open(0, 1)
spidevice2.max_speed_hz = 25000000
spidevice2.mode = 1 # Data is clocked in on falling edge
to_send = [0x00] * 2**12 # 4096 bytes
for j in range(100):
spidevice2.xfer2(to_send)
spidevice2.close()
del spidevice2
def runSPI():
for i in range(2**31 - 1):
spiLoop()
print((2**12 * 100 * (i + 1)) / 2**20, 'Mbytes')
EDIT3
Executing the code snippet did not isolate the problem either. It crashed after the 4th chuck of 1Gbyte-data was sent.
program = '''
import spidev
spidevice = None
def configSPI():
global spidevice
# We only have SPI bus 0 available to us on the Pi
bus = 0
#Device is the chip select pin. Set to 0 or 1, depending on the connections
device = 1
spidevice = spidev.SpiDev()
spidevice.open(bus, device)
spidevice.max_speed_hz = 250000000
spidevice.mode = 1 # Data is clocked in on falling edge
def spiLoop():
to_send = [0xAA] * 2**12
loops = 1024
for j in range(loops):
spidevice.xfer2(to_send)
return len(to_send) * loops
configSPI()
bytes_total = 0
while True:
bytes_sent = spiLoop()
bytes_total += bytes_sent
print(int(bytes_total / 2**20), "Mbytes", int(1000 * (bytes_total / 2**30)) / 10, "% finished")
if bytes_total > 2**30:
break
'''
for i in range(100):
exec(program)
print("program executed", i + 1, "times, bytes sent > ", (i + 1) * 2**30)
Upvotes: 4