iFunction
iFunction

Reputation: 1268

how to access value from pandas dataframe

This script trawls through a folder to access the last column and row value of a dataframe. I have 5 files and should end up with 5 values that I want to add. But I can't seem to get them as values even though I can see them very clearly. Here is the code:

#!/usr/bin/env python3

import os
import glob
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')

path = 'X:/VBOX_Data/Job_5_Shakedown/*.vbo'

count = 0
total_time = 0
for vbo in glob.glob(path, recursive=True):
    print(vbo)
    df = pd.read_csv(vbo,
                     delim_whitespace=True,
                     encoding='iso-8859-1',
                     header=90)
    t = df.tail(1).avitime
    print(t)   # to demonstrate output
    total_time += t
    count += 1

print(total_time)

This is the output:

(C:\Users\ifunction\AppData\Local\Continuum\Anaconda3) c:\Users\ifunction\Prog
ramming>python test.py
X:/VBOX_Data/Job_5_Shakedown\P1GTR__20150922103504_0001.vbo
377    37800.0
Name: avitime, dtype: float64
X:/VBOX_Data/Job_5_Shakedown\P1GTR__20150922110211_0001.vbo
230    23040.0
Name: avitime, dtype: float64
X:/VBOX_Data/Job_5_Shakedown\P1GTR__20150922122444_0001.vbo
1070    135840.0
Name: avitime, dtype: float64
X:/VBOX_Data/Job_5_Shakedown\P1GTR__20150922135839_0001.vbo
7579    757752.0
Name: avitime, dtype: float64
X:/VBOX_Data/Job_5_Shakedown\P1GTR__20150922144312_0001.vbo
6219    621942.0
Name: avitime, dtype: float64
377   NaN
Name: avitime, dtype: float64

I think that I am creating five one by one dataframes and not actually accessing the value. How can I get this as a total value which should be a value of 1576374.

Upvotes: 1

Views: 916

Answers (2)

jezrael
jezrael

Reputation: 862406

It seem you need iat or iloc for return scalar value:

t += df.avitime.iat[-1]
#t += df.avitime.iloc[-1]

Sample:

df = pd.DataFrame({'avitime':[1,2,3]})

print (df)
   avitime
0        1
1        2
2        3

t = df.avitime.iat[-1]
print (t)
3

Another solution is use list comprehension where output is list of last values of column avitime, last use sum of list:

files = glob.glob(path, recursive=True)
last_avitime = [pd.read_csv(vbo, delim_whitespace=True, encoding='iso-8859-1', header=90)
                  .avitime.iat[-1] 
                for vbo in files]

total_time = sum(last_avitime)
print (total_time)

Timings:

df = pd.DataFrame({'avitime':[1,2,3]})
print (df)

In [223]: %timeit (df.tail(1).avitime.values[0])
10000 loops, best of 3: 145 µs per loop

In [224]: %timeit (df.iloc[-1]['avitime'])
10000 loops, best of 3: 92.6 µs per loop

In [225]: %timeit (df.avitime.iat[-1])
100000 loops, best of 3: 14.9 µs per loop

In [226]: %timeit (df.avitime.iloc[-1])
10000 loops, best of 3: 21.3 µs per loop

In [227]: %timeit (df.get_value(df.index[-1], 'avitime'))
100000 loops, best of 3: 5.87 µs per loop

More information about get_value.

Upvotes: 0

Julien Marrec
Julien Marrec

Reputation: 11895

Replace

df.tail(1).avitime.values[0]

with :

df.tail(1).avitime.values[0]
# Or
df.iloc[-1]['avitime']

Currently, you're returning a pd.Series, not a scalar:

In [1]:
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(10,2), columns=['A','avitime'])

In [2]: type(df.tail(1).avitime)
Out[2]: pandas.core.series.Series

Upvotes: 1

Related Questions