Don Juan
Don Juan

Reputation: 1

how to slice a DataFrame in pandas with rows and columns slicing involved

I'm kind of a rookie in python so sorry in advance if I'll raise trivial questions.

I need to create a DataFrame from existing one. my new df will take input from the user and should return a specific new data frame. In my raw data, I have 5 columns of stock info : 'open', 'close', 'high', 'low' and 'volume', and a 'timestamp' column which I already converted to datetime properly.

my user is given the option to pick one or more of the 5 data columns (i.e 'open', 'close', 'high', 'low' and 'volume'), A ticker name and a range of preferable dates. ( I succeeded with the ticker name and the time) this is my output so far now i need to create a df that has another column - 'Data Type', that gets the request of the user, so my output shoud look, for instance, something like this:

 timestamp      ticker_name    data type      
2018-02-26         HP            67.35
2018-02-23         HP            701271

if the user asked to see only the 'open' and the 'volume' values of HP between

2018-02-23  to 2018-02-26 .

I've been struggling with this for Hours. any help will be highy appreciated! Thanks in advance!!

Upvotes: 0

Views: 2547

Answers (3)

Yilun Zhang
Yilun Zhang

Reputation: 9008

It's a simple slice of the dataframe:

s = "2018-01-17"
e = "2018-02-24"
_type = "open"
ticker = "HP"
df[(df.index >= s) & (df.index <= e) & (df.ticker_name == ticker)][["ticker_name",_type]]

Upvotes: 0

piRSquared
piRSquared

Reputation: 294218

You're going to want a combination of things. IMO the simplest way is to set the Timestamp column to be the index so you can use the timestamp slicing. Then use loc to do the rest.

df.set_index('Timestamp')[start:end].loc[df.Ticker == 'HP', ['open', 'volume']]

Upvotes: 0

n3utrino
n3utrino

Reputation: 1210

First off, be careful because "open" has its own meaning in python. Make sure you are treating it like a string by using quotes.

If you want only columns ['open','close'], you would do return_df = df[['open','close']].

If you want only rows with timestamps between 2018-02-23 to 2018-02-26 you can do return_df = df[(df.timestamp >= '2018-02-03') & (df.timestamp <= '2018-02-26')].

Hope this helps.

Upvotes: 1

Related Questions