Reputation: 5
I'm building a system where raspberry pi receives data via bluetooth and parses it into pandas dataframe for further processing. However, there are a few issues. The bluetooth packets are converted into a pandas Series object which I attempted to append into the empty dataframe unsuccesfully. Splitting below is performed in order to extract telemetry from a bluetooth packet.
Code creates a suitable dataframe with correct column names, but when I append into it, the Series object's row numbers become new columns. Each appended series is a single row in the final dataframe. What I want to know is: How do I add Series object into the dataframe so that values are put into columns with indices from 0 to 6 instead of from 7 to 14?
Edit: Added a screenshot with, output on the top, multiple of pkt below.
Edit2: Added full code per request. Added error traceback.
import time
import sys
import subprocess
import pandas as pd
import numpy as np
class Scan:
def __init__(self, count, columns):
self.running = True
self.count = count
self.columns = columns
def run(self):
i_count = 0
p_data = pd.DataFrame(columns=self.columns, dtype='str')
while self.running:
output = subprocess.check_output(["commands", "to", "follow.py"]).decode('utf-8')
p_rows = output.split(";")
series_list = []
print(len(self.columns))
for packet in p_rows:
pkt = pd.Series(packet.split(","),dtype='str', index=self.columns)
pkt = pkt.replace('\n','',regex=True)
print(len(pkt))
series_list.append(pkt)
p_data = pd.DataFrame(pd.concat(series_list, axis=1)).T
print(p_data.head())
print(p_rows[0])
print(list(p_data.columns.values))
if i_count == self.count:
self.running = False
sys.exit()
else:
i_count += 1
time.sleep(10)
def main():
columns = ['mac', 'rssi', 'voltage', 'temperature', 'ad count', 't since boot', 'other']
scan = Scan(0, columns)
while True:
scan.run()
if __name__ == '__main__':
main()
Traceback (most recent call last): File "blescanner.py", line 48, in main() File "blescanner.py", line 45, in main scan.run()
File "blescanner.py", line 24, in run pkt = pd.Series(packet.split(","),dtype='str', index=self.columns)
File "/mypythonpath/site-packages/pandas/core/series.py", line 262, in init .format(val=len(data), ind=len(index)))
ValueError: Length of passed values is 1, index implies 7
Upvotes: 0
Views: 3223
Reputation: 2795
You don't want to append to a DataFrame in that way. What you can do instead is create a list of series, and concatenate them together.
So, something like this:
series_list = []
for packet in p_rows:
pkt = pd.Series(packet.split(","),dtype='str')
print(pkt)
series_list.append(pkt)
p_data = pd.DataFrame(pd.concat(series_list), columns=self.columns, dtype='str')
As long as you don't specify ignore_index=True
in the pd.concat
call the index will not be reset (the default is ignore_index=False
)
Edit:
It's not clear from your question, but if you're trying to add the series as new columns (instead of stack on top of each other), then change the last line from above to:
p_data = pd.concat(series_list, axis=1)
p_data.columns = self.columns
Edit2:
Still not entirely clear, but it sounds like (from your edit) that you want to transpose the series to be the rows, where the index of the series becomes your columns. I.e.:
series_list = []
for packet in p_rows:
pkt = pd.Series(packet.split(","), dtype='str', index=self.columns)
series_list.append(pkt)
p_data = pd.DataFrame(pd.concat(series_list, axis=1)).T
Edit 3:
Based on your picture of output, when you split on ;
the last element in your list is empty. E.g.:
output = """f1:07:ad:6b:97:c8,-24,2800,23.00,17962365,25509655,None;
f1:07:ad:6b:97:c8,-24,2800,23.00,17962365,25509655,None;"""
output.split(';')
['f1:07:ad:6b:97:c8,-24,2800,23.00,17962365,25509655,None',
'\n f1:07:ad:6b:97:c8,-24,2800,23.00,17962365,25509655,None',
'']
So instead of for packet in p_rows
do for packet in p_rows[:-1]
Full example:
columns = ['mac', 'rssi', 'voltage', 'temperature', 'ad count', 't since boot', 'other']
output = """f1:07:ad:6b:97:c8,-24,2800,23.00,17962365,25509655,None;
f1:07:ad:6b:97:c8,-24,2800,23.00,17962365,25509655,None;"""
p_rows = output.split(";")
series_list = []
for packet in p_rows[:-1]:
pkt = pd.Series(packet.strip().split(","), dtype='str', index=columns)
series_list.append(pkt)
p_data = pd.DataFrame(pd.concat(series_list, axis=1)).T
produces
mac rssi voltage temperature ad count t since boot other
0 f1:07:ad:6b:97:c8 -24 2800 23.00 17962365 25509655 None
1 f1:07:ad:6b:97:c8 -24 2800 23.00 17962365 25509655 None
Upvotes: 1
Reputation: 499
This is because of conflicting keys between the p_data
df and pkt
data in your append statement - you need to ensure that the keys in pkt
match the column headings in the p_data
dataframe you are appending to.
Fix this by either re-naming the columns in the p_data
dataframe to the numbers you are seeing in the pkt
, or by re-naming the keys in pkt
before you append the data.
Edit: Following further discussion, agreed column names will not come into it as the incoming data is in the same order as the existing df. Simply wrap pd.DataFrame()
around the pkt
object and make sure the row of data is in the right shape when appending to achieve desired result.
import pandas as pd
import numpy as np
# Set initial df with data
d = pd.DataFrame(['f1:07:ad:6b:97:c8', '-23', '2900', '24.00', '17962371', '25509685', 'None']).T
p_data = pd.DataFrame(data=d, dtype='str')
# Parse new incoming data
output = "f1:07:ad:6b:97:c8;-24;2800;23.00;17962365;25509655;None"
pkt = output.split(";")
# Append new data to existing dataframe
p_data = p_data.append(pd.DataFrame(data=p_rows).T, ignore_index=True)
Upvotes: 0