Reputation: 12341
I want to iterate over a dataframe's major axis date by date.
Example:
tdf = df.ix[date]
The issue I am having is that the type returned by df.ix
changes, leaving me with 3 possible situations
If the date does not exist in tdf
an error is thrown: KeyError: 1394755200000000000
If there is only one item in tdf
: print type(tdf)
returns
<class 'pandas.core.series.Series'>
If there is more than one item in tdf
: print type(tdf)
returns
<class 'pandas.core.frame.DataFrame'>
To avoid the first case I can simply wrap this in a try catch
block or thanks to jxstanford, I can avoid the try catch block by using if date in df.index:
I run into the issue afterwards with an inconsistent API with a pandas series and a pandas data frame. I could solve this by checking for types but it seems I shouldn't have to do that. I would ideally like to keep the types the same. Is there a better way of doing this?
I'm running pandas 0.13.1 and I am currently loading my data from a CSV using
Here's a full example demonstrating the problem.
from pandas import DataFrame
import datetime
path_to_csv = '/home/n/Documents/port/test.csv'
df = DataFrame.from_csv(path_to_csv, index_col=3, header=0, parse_dates=True, sep=',')
start_dt = df.index.min()
end_dt = df.index.max()
dt_step = datetime.timedelta(days=1)
df.sort_index(inplace=True)
cur_dt = start_dt
while cur_dt != end_dt:
if cur_dt in df.index:
print type(df.ix[cur_dt])
#run some other steps using cur_dt
cur_dt += dt_step
An example CSV that demonstrates the problem is as follows:
value1,value2,value3,Date,type
1,2,4,03/13/14,a
2,3,3,03/21/14,b
3,4,2,03/21/14,a
4,5,1,03/27/14,b
The above code prints out
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>
Is it possible to get the value of value1
from tdf in a consistent manner? or am I stuck making an if statement for and separately handle each case?
if type(df.ix[cur_dt]) == DataFrame:
....
if type(df.ix[cur_dt]) == Series:
....
Upvotes: 3
Views: 5384
Reputation: 15718
This toy code will return DataFrames consistently.
def framer(rows):
if ndim(rows) == 1:
return rows.to_frame().T
else:
return rows
for cur_date in df.index:
print type(framer(df.ix[cur_date]))
And this will give you the missing days:
df.resample(rule='D')
Have a look at the resample method docstring. It has its own options to fill up the missing data. And if you decide to make your multiple dates into a single one, the method you're looking at is groupby (if you want to combine values across rows) and drop_duplicates (if you want to ignore them). There is no need to reinvent the wheel.
Upvotes: 1
Reputation: 2555
You can use the apply
method of the DataFrame, using axis = 1
to work on each row of the DataFrame to build a Series with the same Index.
e.g.
def calculate_value(row):
if row.date == pd.datetime(2014,3,21):
return 0
elif row.type == 'a':
return row.value1 + row.value2 + row.value3
else:
return row.value1 * row.value2 * row.value3
df['date'] = df.index
df['NewValue'] = df.apply(calculate_value, axis=1)
modifies your example input as follows
value1 value2 value3 type NewValue date
Date
2014-03-13 1 2 4 a 7 2014-03-13
2014-03-21 2 3 3 b 0 2014-03-21
2014-03-21 3 4 2 a 0 2014-03-21
2014-03-27 4 5 1 b 20 2014-03-27
[4 rows x 6 columns]
Upvotes: 1
Reputation: 3387
Not sure what your trying to do with the dataframe, but this might be better than a try/except:
tdf = DataFrame.from_csv(path_to_csv, index_col=3, header=0, parse_dates=True, sep=',')
while cur_dt != end_dt:
if cur_dt in df.index:
# do your thing
cur_dt += dt_step
Upvotes: 2