Michele Reilly
Michele Reilly

Reputation: 643

List Comprehension Loop

I have a csv file with date, time., price, mag, signal. 62035 rows; there are 42 times of day associated to each unique date in the file.

For each date, when there is an 'S' in the signal column append the corresponding price at the time the 'S' occurred. Below is the attempt.

from pandas import *
from numpy import *
from io import *
from os import *
from sys import *

DF1 = read_csv('___.csv')
idf=DF1.set_index(['date','time','price'],inplace=True)
sStore=[]
for i in idf.index[i][0]:
  sStore.append([idf.index[j][2] for j in idf[j][1] if idf['signal']=='S'])
sStore.head()    
Traceback (most recent call last)
<ipython-input-7-8769220929e4> in <module>()
  1 sStore=[]
  2 
----> 3 for time in idf.index[i][0]:
  4 
  5     sStore.append([idf.index[j][2] for j in idf[j][1] if idf['signal']=='S'])

 NameError: name 'i' is not defined

I do not understand why the i index is not permitted here. Thanks.

I also think it's strange that :

idf.index.levels[0] will show the dates "not parsed" as it is in the file but out of order. Despite that parse_date=True as an argument in set_index.

I bring this up since I was thinking of side swiping the problem with something like:

for i in idf.index.levels[0]:

   sStore.append([idf.index[j][2] for j in idf.index.levels[1] if idf['signal']=='S'])

 sStore.head() 

My edit 12/30/2012 based on DSM's comment below:

I would like to use your idea to get the P&L, as I commented below. Where if S!=B, for any given date, we difference using the closing time, 1620.

v=[df["signal"]=="S"]
t=[df["time"]=="1620"]
u=[df["signal"]!="S"]

df["price"][[v and (u and t)]]

That is, "give me the price at 1620; (even when it doesn't give a "sell signal", S) so that I can diff. with the "extra B's"--for the special case where B>S. This ignores the symmetric concern (where S>B) but for now I want to understand this logical issue.

On traceback, this expression gives:

ValueError: boolean index array should have 1 dimension

Note that in order to invoke df["time'] I do not set_index here. Trying the union operator | gives:

TypeError: unsupported operand type(s) for |: 'list' and 'list'

Looking at Max Fellows's approach,

@Max Fellows

The point is to close out the positions at the end of the day; so we need to capture to price at the close to "unload" all those B, S which were accumulated; but didn't net each other out. If I say:

filterFunc1 = lambda row: row["signal"] == "S" and ([row["signal"] != "S"][row["price"]=="1620"])
filterFunc2 =lambda row: ([row["price"]=="1620"][row["signal"] != "S"])

filterFunc=filterFunc1 and filterFunc2

filteredData = itertools.ifilter(filterFunc, reader)

On traceback:

IndexError: list index out of range

Upvotes: 2

Views: 731

Answers (4)

DSM
DSM

Reputation: 353039

Using @Max Fellows' handy example data, we can have a look at it in pandas. [BTW, you should always try to provide a short, self-contained, correct example (see here for more details), so that the people trying to help you don't have to spend time coming up with one.]

First, import pandas as pd. Then:

In [23]: df = pd.read_csv("sample.csv", names="date time price mag signal".split())

In [24]: df.set_index(["date", "time"], inplace=True)

which gives me

In [25]: df
Out[25]: 
                 price      mag signal
date       time                       
12/28/2012 1:30     10      foo      S
           2:15     11      bar      N
           3:00     12      baz      S
           4:45     13   fibble      N
           5:30     14  whatsit      S
           6:15     15     bobs      N
           7:00     16  widgets      S
           7:45     17  weevils      N
           8:30     18   badger      S
           9:15     19    moose      S
11/29/2012 1:30     10      foo      N
           2:15     11      bar      N
           3:00     12      baz      S
           4:45     13   fibble      N
           5:30     14  whatsit      N
           6:15     15     bobs      N
           7:00     16  widgets      S
           7:45     17  weevils      N
           8:30     18   badger      N
           9:15     19    moose      N
[etc.]

We can see which rows have a signal of S easily:

In [26]: df["signal"] == "S"
Out[26]: 
date        time
12/28/2012  1:30     True
            2:15    False
            3:00     True
            4:45    False
            5:30     True
            6:15    False
[etc..]

and we can select using this too:

In [27]: df["price"][df["signal"] == "S"]
Out[27]: 
date        time
12/28/2012  1:30    10
            3:00    12
            5:30    14
            7:00    16
            8:30    18
            9:15    19
11/29/2012  3:00    12
            7:00    16
12/29/2012  3:00    12
            7:00    16
8/9/2008    3:00    12
            7:00    16
Name: price

This is a DataFrame with every date, time, and price where there's an S. And if you simply want a list:

In [28]: list(df["price"][df["signal"] == "S"])
Out[28]: [10.0, 12.0, 14.0, 16.0, 18.0, 19.0, 12.0, 16.0, 12.0, 16.0, 12.0, 16.0]

Update:

v=[df["signal"]=="S"] makes v a Python list containing a Series. That's not what you want. df["price"][[v and (u and t)]] doesn't make much sense to me either --: v and u are mutually exclusive, so if you and them together, you'll get nothing. For these logical vector ops you can use & and | instead of and and or. Using the reference data again:

In [85]: import pandas as pd

In [86]: df = pd.read_csv("sample.csv", names="date time price mag signal".split())

In [87]: v=df["signal"]=="S"

In [88]: t=df["time"]=="4:45"

In [89]: u=df["signal"]!="S"

In [90]: df[t]
Out[90]: 
          date  time  price     mag signal
3   12/28/2012  4:45     13  fibble      N
13  11/29/2012  4:45     13  fibble      N
23  12/29/2012  4:45     13  fibble      N
33    8/9/2008  4:45     13  fibble      N

In [91]: df["price"][t]
Out[91]: 
3     13
13    13
23    13
33    13
Name: price

In [92]: df["price"][v | (u & t)]
Out[92]: 
0     10
2     12
3     13
4     14
6     16
8     18
9     19
12    12
13    13
16    16
22    12
23    13
26    16
32    12
33    13
36    16
Name: price

[Note: this question has now become too long and meandering. I suggest spending some time working through the examples in the pandas documentation at the console to get a feel for it.]

Upvotes: 2

Max Fellows
Max Fellows

Reputation: 392

This is what I think you're trying to accomplish based on your edit: for every date in your CSV file, group the date along with a list of prices for each item with a signal of "S".

You didn't include any sample data in your question, so I made a test one that I hope matches the format you described:

12/28/2012,1:30,10.00,"foo","S"
12/28/2012,2:15,11.00,"bar","N"
12/28/2012,3:00,12.00,"baz","S"
12/28/2012,4:45,13.00,"fibble","N"
12/28/2012,5:30,14.00,"whatsit","S"
12/28/2012,6:15,15.00,"bobs","N"
12/28/2012,7:00,16.00,"widgets","S"
12/28/2012,7:45,17.00,"weevils","N"
12/28/2012,8:30,18.00,"badger","S"
12/28/2012,9:15,19.00,"moose","S"
11/29/2012,1:30,10.00,"foo","N"
11/29/2012,2:15,11.00,"bar","N"
11/29/2012,3:00,12.00,"baz","S"
11/29/2012,4:45,13.00,"fibble","N"
11/29/2012,5:30,14.00,"whatsit","N"
11/29/2012,6:15,15.00,"bobs","N"
11/29/2012,7:00,16.00,"widgets","S"
11/29/2012,7:45,17.00,"weevils","N"
11/29/2012,8:30,18.00,"badger","N"
11/29/2012,9:15,19.00,"moose","N"
12/29/2012,1:30,10.00,"foo","N"
12/29/2012,2:15,11.00,"bar","N"
12/29/2012,3:00,12.00,"baz","S"
12/29/2012,4:45,13.00,"fibble","N"
12/29/2012,5:30,14.00,"whatsit","N"
12/29/2012,6:15,15.00,"bobs","N"
12/29/2012,7:00,16.00,"widgets","S"
12/29/2012,7:45,17.00,"weevils","N"
12/29/2012,8:30,18.00,"badger","N"
12/29/2012,9:15,19.00,"moose","N"
8/9/2008,1:30,10.00,"foo","N"
8/9/2008,2:15,11.00,"bar","N"
8/9/2008,3:00,12.00,"baz","S"
8/9/2008,4:45,13.00,"fibble","N"
8/9/2008,5:30,14.00,"whatsit","N"
8/9/2008,6:15,15.00,"bobs","N"
8/9/2008,7:00,16.00,"widgets","S"
8/9/2008,7:45,17.00,"weevils","N"
8/9/2008,8:30,18.00,"badger","N"
8/9/2008,9:15,19.00,"moose","N"

And here's a method using Python 2.7 and built-in libraries to group it in the way it sounds like you want:

import csv
import itertools
import time
from collections import OrderedDict

with open("sample.csv", "r") as file:
    reader = csv.DictReader(file,
                            fieldnames=["date", "time", "price", "mag", "signal"])

    # Reduce the size of the data set by filtering out the non-"S" rows.
    filterFunc = lambda row: row["signal"] == "S"
    filteredData = itertools.ifilter(filterFunc, reader)

    # Sort by date so we can use the groupby function.
    dateKeyFunc = lambda row: time.strptime(row["date"], r"%m/%d/%Y")
    sortedData = sorted(filteredData, key=dateKeyFunc)

    # Group by date: create a new dictionary of date to a list of prices.
    datePrices = OrderedDict((date, [row["price"] for row in rows])
                             for date, rows
                             in itertools.groupby(sortedData, dateKeyFunc))

for date, prices in datePrices.iteritems():
    print "{0}: {1}".format(time.strftime(r"%m/%d/%Y", date),
                            ", ".join(str(price) for price in prices))

>>> 08/09/2008: 12.00, 16.00
>>> 11/29/2012: 12.00, 16.00
>>> 12/28/2012: 10.00, 12.00, 14.00, 16.00, 18.00, 19.00
>>> 12/29/2012: 12.00, 16.00

The type conversions are up to you, since you may be using other libraries to do your CSV reading, but that should hopefully get you started -- and take careful note of @DSM's comment about import *.

Upvotes: 2

&#211;scar L&#243;pez
&#211;scar L&#243;pez

Reputation: 236004

Try something like this:

for i in range(len(idf.index)):
  value = idf.index[i][0]

Same thing for the iteration with the j index variable. As has been pointed, you can't reference the iteration index in the expression to be iterated, and besides you need to perform a very specific iteration (traversing over a column in a matrix), and Python's default iterators won't work "out of the box" for this, so a custom index handling is needed here.

Upvotes: 2

tkone
tkone

Reputation: 22728

It's because i is not yet defined, just like the error message says.

In this line:

for i in idf.index[i][0]:

You are telling the Python interpreter to iterate over all the values yielded by the list returning from the expression idf.index[i][0] but you have not yet defined what i is (although you are attempting to set each item in the list to the variable i as well).

The way the Python for ... in ... loop works is that it takes the right most component and asks for the next item from the iterator. It then assigns the value yielded by the call to the variable name provided on the left hand side.

Upvotes: 1

Related Questions