Reputation: 643
I have a csv file with date, time., price, mag, signal. 62035 rows; there are 42 times of day associated to each unique date in the file.
For each date, when there is an 'S' in the signal column append the corresponding price at the time the 'S' occurred. Below is the attempt.
from pandas import *
from numpy import *
from io import *
from os import *
from sys import *
DF1 = read_csv('___.csv')
idf=DF1.set_index(['date','time','price'],inplace=True)
sStore=[]
for i in idf.index[i][0]:
sStore.append([idf.index[j][2] for j in idf[j][1] if idf['signal']=='S'])
sStore.head()
Traceback (most recent call last)
<ipython-input-7-8769220929e4> in <module>()
1 sStore=[]
2
----> 3 for time in idf.index[i][0]:
4
5 sStore.append([idf.index[j][2] for j in idf[j][1] if idf['signal']=='S'])
NameError: name 'i' is not defined
I do not understand why the i index is not permitted here. Thanks.
I also think it's strange that :
idf.index.levels[0] will show the dates "not parsed" as it is in the file but out of order. Despite that parse_date=True as an argument in set_index.
I bring this up since I was thinking of side swiping the problem with something like:
for i in idf.index.levels[0]:
sStore.append([idf.index[j][2] for j in idf.index.levels[1] if idf['signal']=='S'])
sStore.head()
My edit 12/30/2012 based on DSM's comment below:
I would like to use your idea to get the P&L, as I commented below. Where if S!=B, for any given date, we difference using the closing time, 1620.
v=[df["signal"]=="S"]
t=[df["time"]=="1620"]
u=[df["signal"]!="S"]
df["price"][[v and (u and t)]]
That is, "give me the price at 1620; (even when it doesn't give a "sell signal", S) so that I can diff. with the "extra B's"--for the special case where B>S. This ignores the symmetric concern (where S>B) but for now I want to understand this logical issue.
On traceback, this expression gives:
ValueError: boolean index array should have 1 dimension
Note that in order to invoke df["time'] I do not set_index here. Trying the union operator | gives:
TypeError: unsupported operand type(s) for |: 'list' and 'list'
Looking at Max Fellows's approach,
@Max Fellows
The point is to close out the positions at the end of the day; so we need to capture to price at the close to "unload" all those B, S which were accumulated; but didn't net each other out. If I say:
filterFunc1 = lambda row: row["signal"] == "S" and ([row["signal"] != "S"][row["price"]=="1620"])
filterFunc2 =lambda row: ([row["price"]=="1620"][row["signal"] != "S"])
filterFunc=filterFunc1 and filterFunc2
filteredData = itertools.ifilter(filterFunc, reader)
On traceback:
IndexError: list index out of range
Upvotes: 2
Views: 731
Reputation: 353039
Using @Max Fellows' handy example data, we can have a look at it in pandas
. [BTW, you should always try to provide a short, self-contained, correct example (see here for more details), so that the people trying to help you don't have to spend time coming up with one.]
First, import pandas as pd
. Then:
In [23]: df = pd.read_csv("sample.csv", names="date time price mag signal".split())
In [24]: df.set_index(["date", "time"], inplace=True)
which gives me
In [25]: df
Out[25]:
price mag signal
date time
12/28/2012 1:30 10 foo S
2:15 11 bar N
3:00 12 baz S
4:45 13 fibble N
5:30 14 whatsit S
6:15 15 bobs N
7:00 16 widgets S
7:45 17 weevils N
8:30 18 badger S
9:15 19 moose S
11/29/2012 1:30 10 foo N
2:15 11 bar N
3:00 12 baz S
4:45 13 fibble N
5:30 14 whatsit N
6:15 15 bobs N
7:00 16 widgets S
7:45 17 weevils N
8:30 18 badger N
9:15 19 moose N
[etc.]
We can see which rows have a signal of S
easily:
In [26]: df["signal"] == "S"
Out[26]:
date time
12/28/2012 1:30 True
2:15 False
3:00 True
4:45 False
5:30 True
6:15 False
[etc..]
and we can select using this too:
In [27]: df["price"][df["signal"] == "S"]
Out[27]:
date time
12/28/2012 1:30 10
3:00 12
5:30 14
7:00 16
8:30 18
9:15 19
11/29/2012 3:00 12
7:00 16
12/29/2012 3:00 12
7:00 16
8/9/2008 3:00 12
7:00 16
Name: price
This is a DataFrame
with every date, time, and price where there's an S
. And if you simply want a list:
In [28]: list(df["price"][df["signal"] == "S"])
Out[28]: [10.0, 12.0, 14.0, 16.0, 18.0, 19.0, 12.0, 16.0, 12.0, 16.0, 12.0, 16.0]
Update:
v=[df["signal"]=="S"]
makes v
a Python list
containing a Series
. That's not what you want. df["price"][[v and (u and t)]]
doesn't make much sense to me either --: v
and u
are mutually exclusive, so if you and them together, you'll get nothing. For these logical vector ops you can use &
and |
instead of and
and or
. Using the reference data again:
In [85]: import pandas as pd
In [86]: df = pd.read_csv("sample.csv", names="date time price mag signal".split())
In [87]: v=df["signal"]=="S"
In [88]: t=df["time"]=="4:45"
In [89]: u=df["signal"]!="S"
In [90]: df[t]
Out[90]:
date time price mag signal
3 12/28/2012 4:45 13 fibble N
13 11/29/2012 4:45 13 fibble N
23 12/29/2012 4:45 13 fibble N
33 8/9/2008 4:45 13 fibble N
In [91]: df["price"][t]
Out[91]:
3 13
13 13
23 13
33 13
Name: price
In [92]: df["price"][v | (u & t)]
Out[92]:
0 10
2 12
3 13
4 14
6 16
8 18
9 19
12 12
13 13
16 16
22 12
23 13
26 16
32 12
33 13
36 16
Name: price
[Note: this question has now become too long and meandering. I suggest spending some time working through the examples in the pandas
documentation at the console to get a feel for it.]
Upvotes: 2
Reputation: 392
This is what I think you're trying to accomplish based on your edit: for every date in your CSV file, group the date along with a list of prices for each item with a signal of "S".
You didn't include any sample data in your question, so I made a test one that I hope matches the format you described:
12/28/2012,1:30,10.00,"foo","S"
12/28/2012,2:15,11.00,"bar","N"
12/28/2012,3:00,12.00,"baz","S"
12/28/2012,4:45,13.00,"fibble","N"
12/28/2012,5:30,14.00,"whatsit","S"
12/28/2012,6:15,15.00,"bobs","N"
12/28/2012,7:00,16.00,"widgets","S"
12/28/2012,7:45,17.00,"weevils","N"
12/28/2012,8:30,18.00,"badger","S"
12/28/2012,9:15,19.00,"moose","S"
11/29/2012,1:30,10.00,"foo","N"
11/29/2012,2:15,11.00,"bar","N"
11/29/2012,3:00,12.00,"baz","S"
11/29/2012,4:45,13.00,"fibble","N"
11/29/2012,5:30,14.00,"whatsit","N"
11/29/2012,6:15,15.00,"bobs","N"
11/29/2012,7:00,16.00,"widgets","S"
11/29/2012,7:45,17.00,"weevils","N"
11/29/2012,8:30,18.00,"badger","N"
11/29/2012,9:15,19.00,"moose","N"
12/29/2012,1:30,10.00,"foo","N"
12/29/2012,2:15,11.00,"bar","N"
12/29/2012,3:00,12.00,"baz","S"
12/29/2012,4:45,13.00,"fibble","N"
12/29/2012,5:30,14.00,"whatsit","N"
12/29/2012,6:15,15.00,"bobs","N"
12/29/2012,7:00,16.00,"widgets","S"
12/29/2012,7:45,17.00,"weevils","N"
12/29/2012,8:30,18.00,"badger","N"
12/29/2012,9:15,19.00,"moose","N"
8/9/2008,1:30,10.00,"foo","N"
8/9/2008,2:15,11.00,"bar","N"
8/9/2008,3:00,12.00,"baz","S"
8/9/2008,4:45,13.00,"fibble","N"
8/9/2008,5:30,14.00,"whatsit","N"
8/9/2008,6:15,15.00,"bobs","N"
8/9/2008,7:00,16.00,"widgets","S"
8/9/2008,7:45,17.00,"weevils","N"
8/9/2008,8:30,18.00,"badger","N"
8/9/2008,9:15,19.00,"moose","N"
And here's a method using Python 2.7 and built-in libraries to group it in the way it sounds like you want:
import csv
import itertools
import time
from collections import OrderedDict
with open("sample.csv", "r") as file:
reader = csv.DictReader(file,
fieldnames=["date", "time", "price", "mag", "signal"])
# Reduce the size of the data set by filtering out the non-"S" rows.
filterFunc = lambda row: row["signal"] == "S"
filteredData = itertools.ifilter(filterFunc, reader)
# Sort by date so we can use the groupby function.
dateKeyFunc = lambda row: time.strptime(row["date"], r"%m/%d/%Y")
sortedData = sorted(filteredData, key=dateKeyFunc)
# Group by date: create a new dictionary of date to a list of prices.
datePrices = OrderedDict((date, [row["price"] for row in rows])
for date, rows
in itertools.groupby(sortedData, dateKeyFunc))
for date, prices in datePrices.iteritems():
print "{0}: {1}".format(time.strftime(r"%m/%d/%Y", date),
", ".join(str(price) for price in prices))
>>> 08/09/2008: 12.00, 16.00
>>> 11/29/2012: 12.00, 16.00
>>> 12/28/2012: 10.00, 12.00, 14.00, 16.00, 18.00, 19.00
>>> 12/29/2012: 12.00, 16.00
The type conversions are up to you, since you may be using other libraries to do your CSV reading, but that should hopefully get you started -- and take careful note of @DSM's comment about import *.
Upvotes: 2
Reputation: 236004
Try something like this:
for i in range(len(idf.index)):
value = idf.index[i][0]
Same thing for the iteration with the j
index variable. As has been pointed, you can't reference the iteration index in the expression to be iterated, and besides you need to perform a very specific iteration (traversing over a column in a matrix), and Python's default iterators won't work "out of the box" for this, so a custom index handling is needed here.
Upvotes: 2
Reputation: 22728
It's because i
is not yet defined, just like the error message says.
In this line:
for i in idf.index[i][0]:
You are telling the Python interpreter to iterate over all the values yielded by the list returning from the expression idf.index[i][0]
but you have not yet defined what i
is (although you are attempting to set each item in the list to the variable i
as well).
The way the Python for ... in ...
loop works is that it takes the right most component and asks for the next
item from the iterator. It then assigns the value yielded by the call to the variable name provided on the left hand side.
Upvotes: 1