Reputation: 1000

Appending Rows to a Pandas DataFrame

I'm trying to read an analog signal from a Measurement-Computing Personal Measurement Device (PMD-1208FS) and then write it to a file with a corresponding timestamp for each observation. I would like to append to this file once every second with a new observation.

The PyUniversalLibrary allows me to read from the device, but I'm stuck trying to figure out how to save the information into a dataframe. This example was helpful for reading data from the PMD, but it doesn't provide any data logging examples.

The example below gets close to solving this problem, but the df.append(pd.DataFrame() function is not providing me with the desired result. This function ends up appending the most recent dataframe to the bottom of the previously saved one, rather than just appending the new data. The result is a dataframe with many duplicate dataframes in sequence.

Here's my code:

## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd

## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0

## Create empty lists and a dataframe to fill:
co = [] ## carbon monoxide concentration in ppm
data = [] ## raw analog output between 0-5V
times = [] ## timestamp
df = pd.DataFrame()


## Set filepath:
filename = "~/pmd_data.csv"

while True:
    ts = time.time()
    DataValue = UL.cbAIn(BoardNum, Chan, Gain)
    EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
    ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
    data.append(EngUnits)
    times.append(datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S'))
    co.append(ppm)
    ## This line of code is not providing the desired result:
    df = df.append(pd.DataFrame({'co':ppm, 'volts':data, 'datetime':times})) 
    print(df)
    df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
    time.sleep(1)

Current Output:

    co    datetime    volts
0    13.8    2017-05-03 15:57:19   1.38
1    13.8    2017-05-03 15:57:19   1.38    
2    13.9    2017-05-03 15:57:20   1.39
3    13.8    2017-05-03 15:57:19   1.38
4    13.9    2017-05-03 15:57:20   1.39
5    14.2    2017-05-03 15:57:21   1.42

Desired Output:

    co    datetime    volts
0    13.8    2017-05-03 15:57:19   1.38
1    13.9    2017-05-03 15:57:20   1.39
2    14.2    2017-05-03 15:57:21   1.42

Upvotes: 3

Answers (3)

Matt Morgan

Reputation: 43

If you're just looking to append, then you don't need a counter with .loc. You can just change it to df.loc[len(df)] = row . This will always write a new row at the end of the DataFrame.

Updated code from piRSquared code here:

## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd

## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0

## Create empty lists and a dataframe to fill:
df = pd.DataFrame(columns=['co', 'volts', 'datetime'])

## Set filepath:
filename = "~/pmd_data.csv"

while True:
    ts = time.time()
    DataValue = UL.cbAIn(BoardNum, Chan, Gain)
    EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
    ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
    df.loc[len(df)] = pd.Series(dict(
            co=ppm, volts=EngUnits, datetime=ts
        ))
    ## This line of code is not providing the desired result:
    df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
    time.sleep(1)

Upvotes: 2

plasmon360

Reputation: 4199

you are appending a dataframe with lists (that grow with time) for each field every time it enters a while loop. But you should be adding a dataframe with a list with only one element for each field at a time. please see example below

you are essentially doing this:

co = [] ## carbon monoxide concentration in ppm
data = [] ## raw analog output between 0-5V
times = [] ## timestamp

df = pd.DataFrame()
for i in range(0,5):
    data.append(i)
    times.append(i)
    co.append(i)
    df = df.append(pd.DataFrame({'co':co, 'volts':data, 'datetime':times}))
print df

which results in

   co  datetime  volts
0   0         0      0
0   0         0      0
1   1         1      1
0   0         0      0
1   1         1      1
2   2         2      2
0   0         0      0
1   1         1      1
2   2         2      2
3   3         3      3
0   0         0      0
1   1         1      1
2   2         2      2
3   3         3      3
4   4         4      4

but you should be doing this

df = pd.DataFrame()
for i in range(0,5):
    df = df.append(pd.DataFrame({'co':[i], 'volts':[i], 'datetime':[i]}))
print df

which results in

   co  datetime  volts
0   0         0      0
0   1         1      1
0   2         2      2
0   3         3      3
0   4         4      4

so your code should like

## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd

## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0

## Create empty dataframe to fill:
df = pd.DataFrame()

## Set filepath:
filename = "~/pmd_data.csv"

while True:
    ts = time.time()
    DataValue = UL.cbAIn(BoardNum, Chan, Gain)
    EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
    ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
    times = (datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S'))
    df = df.append(pd.DataFrame({'co':[ppm], 'volts':[EngUnits], 'datetime':[times]})) 
    print(df)
    df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
    time.sleep(1)

Upvotes: 1

piRSquared

Reputation: 294278

Since you don't use the index specifically, I'd keep a counter and use it to add a new row to an existing dataframe.

I'd rewrite the while loop like this

## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd

## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0

## Create empty lists and a dataframe to fill:
df = pd.DataFrame(columns=['co', 'volts', 'datetime'])

## Set filepath:
filename = "~/pmd_data.csv"

counter = 0
while True:
    ts = time.time()
    DataValue = UL.cbAIn(BoardNum, Chan, Gain)
    EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
    ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
    df.loc[counter] = pd.Series(dict(
            co=ppm, volts=EngUnits, datetime=ts
        ))
    ## This line of code is not providing the desired result:
    counter += 1
    df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
    time.sleep(1)