Reputation: 1268
This script trawls through a folder to access the last column and row value of a dataframe. I have 5 files and should end up with 5 values that I want to add. But I can't seem to get them as values even though I can see them very clearly. Here is the code:
#!/usr/bin/env python3
import os
import glob
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')
path = 'X:/VBOX_Data/Job_5_Shakedown/*.vbo'
count = 0
total_time = 0
for vbo in glob.glob(path, recursive=True):
print(vbo)
df = pd.read_csv(vbo,
delim_whitespace=True,
encoding='iso-8859-1',
header=90)
t = df.tail(1).avitime
print(t) # to demonstrate output
total_time += t
count += 1
print(total_time)
This is the output:
(C:\Users\ifunction\AppData\Local\Continuum\Anaconda3) c:\Users\ifunction\Prog
ramming>python test.py
X:/VBOX_Data/Job_5_Shakedown\P1GTR__20150922103504_0001.vbo
377 37800.0
Name: avitime, dtype: float64
X:/VBOX_Data/Job_5_Shakedown\P1GTR__20150922110211_0001.vbo
230 23040.0
Name: avitime, dtype: float64
X:/VBOX_Data/Job_5_Shakedown\P1GTR__20150922122444_0001.vbo
1070 135840.0
Name: avitime, dtype: float64
X:/VBOX_Data/Job_5_Shakedown\P1GTR__20150922135839_0001.vbo
7579 757752.0
Name: avitime, dtype: float64
X:/VBOX_Data/Job_5_Shakedown\P1GTR__20150922144312_0001.vbo
6219 621942.0
Name: avitime, dtype: float64
377 NaN
Name: avitime, dtype: float64
I think that I am creating five one by one dataframes and not actually accessing the value. How can I get this as a total value which should be a value of 1576374.
Upvotes: 1
Views: 916
Reputation: 862406
It seem you need iat
or iloc
for return scalar value:
t += df.avitime.iat[-1]
#t += df.avitime.iloc[-1]
Sample:
df = pd.DataFrame({'avitime':[1,2,3]})
print (df)
avitime
0 1
1 2
2 3
t = df.avitime.iat[-1]
print (t)
3
Another solution is use list comprehension
where output is list of last values of column avitime
, last use sum
of list:
files = glob.glob(path, recursive=True)
last_avitime = [pd.read_csv(vbo, delim_whitespace=True, encoding='iso-8859-1', header=90)
.avitime.iat[-1]
for vbo in files]
total_time = sum(last_avitime)
print (total_time)
Timings:
df = pd.DataFrame({'avitime':[1,2,3]})
print (df)
In [223]: %timeit (df.tail(1).avitime.values[0])
10000 loops, best of 3: 145 µs per loop
In [224]: %timeit (df.iloc[-1]['avitime'])
10000 loops, best of 3: 92.6 µs per loop
In [225]: %timeit (df.avitime.iat[-1])
100000 loops, best of 3: 14.9 µs per loop
In [226]: %timeit (df.avitime.iloc[-1])
10000 loops, best of 3: 21.3 µs per loop
In [227]: %timeit (df.get_value(df.index[-1], 'avitime'))
100000 loops, best of 3: 5.87 µs per loop
More information about get_value.
Upvotes: 0
Reputation: 11895
Replace
df.tail(1).avitime.values[0]
with :
df.tail(1).avitime.values[0]
# Or
df.iloc[-1]['avitime']
Currently, you're returning a pd.Series
, not a scalar:
In [1]:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10,2), columns=['A','avitime'])
In [2]: type(df.tail(1).avitime)
Out[2]: pandas.core.series.Series
Upvotes: 1