Minregx
Minregx

Reputation: 5

Appending data into pandas dataframe

I'm building a system where raspberry pi receives data via bluetooth and parses it into pandas dataframe for further processing. However, there are a few issues. The bluetooth packets are converted into a pandas Series object which I attempted to append into the empty dataframe unsuccesfully. Splitting below is performed in order to extract telemetry from a bluetooth packet.

Code creates a suitable dataframe with correct column names, but when I append into it, the Series object's row numbers become new columns. Each appended series is a single row in the final dataframe. What I want to know is: How do I add Series object into the dataframe so that values are put into columns with indices from 0 to 6 instead of from 7 to 14?

Edit: Added a screenshot with, output on the top, multiple of pkt below.

Edit2: Added full code per request. Added error traceback.

import time
import sys
import subprocess
import pandas as pd
import numpy as np

class Scan:
    def __init__(self, count, columns):
        self.running = True
        self.count = count
        self.columns = columns

    def run(self):
        i_count = 0
        p_data = pd.DataFrame(columns=self.columns, dtype='str')

        while self.running:
            output = subprocess.check_output(["commands", "to", "follow.py"]).decode('utf-8')
            p_rows = output.split(";")
            series_list = []
            print(len(self.columns))

            for packet in p_rows:
                pkt = pd.Series(packet.split(","),dtype='str', index=self.columns)
                pkt = pkt.replace('\n','',regex=True)
                print(len(pkt))
                series_list.append(pkt)
            p_data = pd.DataFrame(pd.concat(series_list, axis=1)).T

            print(p_data.head())
            print(p_rows[0])
            print(list(p_data.columns.values))

            if i_count  == self.count:
                self.running = False
                sys.exit()
            else:
                i_count += 1
            time.sleep(10)

def main():
    columns = ['mac', 'rssi', 'voltage', 'temperature', 'ad count', 't since boot', 'other']
    scan = Scan(0, columns)

while True:
    scan.run()

if __name__ == '__main__':
    main()

Traceback (most recent call last): File "blescanner.py", line 48, in main() File "blescanner.py", line 45, in main scan.run()

File "blescanner.py", line 24, in run pkt = pd.Series(packet.split(","),dtype='str', index=self.columns)

File "/mypythonpath/site-packages/pandas/core/series.py", line 262, in init .format(val=len(data), ind=len(index)))

ValueError: Length of passed values is 1, index implies 7

Upvotes: 0

Views: 3223

Answers (2)

dan_g
dan_g

Reputation: 2795

You don't want to append to a DataFrame in that way. What you can do instead is create a list of series, and concatenate them together.

So, something like this:

series_list = []
for packet in p_rows:
    pkt = pd.Series(packet.split(","),dtype='str')
    print(pkt)
    series_list.append(pkt)
p_data = pd.DataFrame(pd.concat(series_list), columns=self.columns, dtype='str')

As long as you don't specify ignore_index=True in the pd.concat call the index will not be reset (the default is ignore_index=False)

Edit:

It's not clear from your question, but if you're trying to add the series as new columns (instead of stack on top of each other), then change the last line from above to:

p_data = pd.concat(series_list, axis=1)
p_data.columns = self.columns

Edit2:

Still not entirely clear, but it sounds like (from your edit) that you want to transpose the series to be the rows, where the index of the series becomes your columns. I.e.:

series_list = []
for packet in p_rows:
    pkt = pd.Series(packet.split(","), dtype='str', index=self.columns)
    series_list.append(pkt)
p_data = pd.DataFrame(pd.concat(series_list, axis=1)).T

Edit 3: Based on your picture of output, when you split on ; the last element in your list is empty. E.g.:

output = """f1:07:ad:6b:97:c8,-24,2800,23.00,17962365,25509655,None;
            f1:07:ad:6b:97:c8,-24,2800,23.00,17962365,25509655,None;"""

output.split(';')

['f1:07:ad:6b:97:c8,-24,2800,23.00,17962365,25509655,None',
 '\n            f1:07:ad:6b:97:c8,-24,2800,23.00,17962365,25509655,None',
 '']

So instead of for packet in p_rows do for packet in p_rows[:-1]

Full example:

columns = ['mac', 'rssi', 'voltage', 'temperature', 'ad count', 't since boot', 'other']

output = """f1:07:ad:6b:97:c8,-24,2800,23.00,17962365,25509655,None;
            f1:07:ad:6b:97:c8,-24,2800,23.00,17962365,25509655,None;"""
p_rows = output.split(";")
series_list = []

for packet in p_rows[:-1]:
    pkt = pd.Series(packet.strip().split(","), dtype='str', index=columns)
    series_list.append(pkt)
p_data = pd.DataFrame(pd.concat(series_list, axis=1)).T

produces

                 mac rssi voltage temperature  ad count t since boot other
0  f1:07:ad:6b:97:c8  -24    2800       23.00  17962365     25509655  None
1  f1:07:ad:6b:97:c8  -24    2800       23.00  17962365     25509655  None

Upvotes: 1

Joe Plumb
Joe Plumb

Reputation: 499

This is because of conflicting keys between the p_data df and pkt data in your append statement - you need to ensure that the keys in pkt match the column headings in the p_data dataframe you are appending to.

Fix this by either re-naming the columns in the p_data dataframe to the numbers you are seeing in the pkt, or by re-naming the keys in pkt before you append the data.

Edit: Following further discussion, agreed column names will not come into it as the incoming data is in the same order as the existing df. Simply wrap pd.DataFrame() around the pkt object and make sure the row of data is in the right shape when appending to achieve desired result.

import pandas as pd
import numpy as np

# Set initial df with data
d = pd.DataFrame(['f1:07:ad:6b:97:c8', '-23', '2900', '24.00', '17962371', '25509685', 'None']).T
p_data = pd.DataFrame(data=d, dtype='str')

# Parse new incoming data
output = "f1:07:ad:6b:97:c8;-24;2800;23.00;17962365;25509655;None"
pkt = output.split(";")

# Append new data to existing dataframe
p_data = p_data.append(pd.DataFrame(data=p_rows).T, ignore_index=True)

Upvotes: 0

Related Questions