jivers
jivers

Reputation: 1000

Create DataFrames or Dictionaries from Unique Values in Separate Columns

I'm a Python newbie trying to model stock trades from a DataFrame containing timestamped trade executions. For example, index 0-2 below represent three executions of a single trade. I'm trying to isolate each group of executions that represent a trade. Here's a sample of the existing aggregate:

     event  side symbol  shares     price      time  trade_date
0  Execute  Shrt   TSLA      25  311.9500  10:29:51  2019-01-19
1  Execute   Buy   TSLA      20  310.7300  10:30:42  2019-01-19
2  Execute   Buy   TSLA       5  311.1173  10:31:15  2019-01-19
3  Execute   Buy   BYND      25   83.3027  11:06:15  2019-01-19
4  Execute  Shrt   BYND      13   84.0500  11:07:11  2019-01-19
5  Execute  Sell   BYND      12   83.2500  11:07:42  2019-01-19
6  Execute   Buy   NVDA      25  297.3400  12:07:42  2019-01-20
7  Execute  Shrt   AMZN      10  500.0100  12:09:12  2019-01-20
8  Execute  Sell   NVDA      25  296.7500  12:10:30  2019-01-20
9  Execute   Buy   AMZN      10  495.7500  12:11:15  2019-01-20

The approach in this post creates slices from unique values in a single column, but I'm unsure of how to make the second slice. With this approach applied, I have:

date_list = list(set(execs_df['trade_date'])) # Create list of dates from original DataFrame

by_date_dict = {date: execs_df.loc[execs_df['trade_date'] == date] for date in date_list}

for date in date_list:
    print(by_date__dict[date])

This produces the following, date-specific dictionaries:

      side symbol  shares     price      time    trade_date  p & l     trades      value  
0     Shrt   TSLA    25.0  311.9500  10:29:51    2019-11-01    NaN        NaN  7798.7500  
1      Buy   TSLA     8.0  311.2000  10:30:31    2019-11-01    NaN        NaN  2489.6000  
2      Buy   TSLA     8.0  310.7300  10:30:42    2019-11-01    NaN        NaN  2485.8400  
3      Buy   TSLA     4.0  311.1173  10:31:15    2019-11-01    NaN        NaN  1244.4692  
4      Buy   TSLA     5.0  311.5500  10:35:39    2019-11-01    NaN        NaN  1557.7500 
5     Shrt   BYND    25.0   83.3027  11:06:15    2019-11-01    NaN        NaN  2082.5675  
6      Buy   BYND    12.0   83.0500  11:06:43    2019-11-01    NaN        NaN   996.6000  
7      Buy   BYND    13.0   83.2400  11:07:49    2019-11-01    NaN        NaN  1082.1200  

In terms of final output, I need the following:

      side symbol  shares     price      time    trade_date  p & l     trades      value  
0     Shrt   TSLA    25.0  311.9500  10:29:51    2019-11-01    NaN        NaN  7798.7500  
1      Buy   TSLA     8.0  311.2000  10:30:31    2019-11-01    NaN        NaN  2489.6000  
2      Buy   TSLA     8.0  310.7300  10:30:42    2019-11-01    NaN        NaN  2485.8400  
3      Buy   TSLA     4.0  311.1173  10:31:15    2019-11-01    NaN        NaN  1244.4692  
4      Buy   TSLA     5.0  311.5500  10:35:39    2019-11-01    NaN        NaN  1557.7500

      side symbol  shares     price      time    trade_date  p & l     trades      value
0     Shrt   BYND    25.0   83.3027  11:06:15    2019-11-01    NaN        NaN  2082.5675  
1      Buy   BYND    12.0   83.0500  11:06:43    2019-11-01    NaN        NaN   996.6000  
2      Buy   BYND    13.0   83.2400  11:07:49    2019-11-01    NaN        NaN  1082.1200  

etc...

Any pointers would be greatly appreciated.

Upvotes: 1

Views: 32

Answers (1)

Trenton McKinney
Trenton McKinney

Reputation: 62393

  • Given your current dictionary of dataframes by_date_dict
  • The following code we be a dict of dict of dataframes
    • Top key is still the date
    • Under each data key is a key for the symbol (e.g. updated_df['2019-11-01']['BYND'])
updated_df = {k: {sym: v[v.symbol == sym] for sym in v.symbol.unique()} for k, v in by_date_dict.items()}

# structure
{date: {symbol: df,
        symbol: df,
        symbol: df}
 date: {symbol, df,
        symbol, df,
        symbol, df}}

for k, v in updated_df.items():
    print(k)
    for x, y in v.items():
        print(x)
        print(y.to_markdown())

2019-11-01
TSLA
|    | side   | symbol   |   shares |   price | time     | trade_date   |   pl |   trades |   value |
|---:|:-------|:---------|---------:|--------:|:---------|:-------------|-----:|---------:|--------:|
|  0 | Shrt   | TSLA     |       25 | 311.95  | 10:29:51 | 2019-11-01   |  nan |      nan | 7798.75 |
|  1 | Buy    | TSLA     |        8 | 311.2   | 10:30:31 | 2019-11-01   |  nan |      nan | 2489.6  |
|  2 | Buy    | TSLA     |        8 | 310.73  | 10:30:42 | 2019-11-01   |  nan |      nan | 2485.84 |
|  3 | Buy    | TSLA     |        4 | 311.117 | 10:31:15 | 2019-11-01   |  nan |      nan | 1244.47 |
|  4 | Buy    | TSLA     |        5 | 311.55  | 10:35:39 | 2019-11-01   |  nan |      nan | 1557.75 |
BYND
|    | side   | symbol   |   shares |   price | time     | trade_date   |   pl |   trades |   value |
|---:|:-------|:---------|---------:|--------:|:---------|:-------------|-----:|---------:|--------:|
|  5 | Shrt   | BYND     |       25 | 83.3027 | 11:06:15 | 2019-11-01   |  nan |      nan | 2082.57 |
|  6 | Buy    | BYND     |       12 | 83.05   | 11:06:43 | 2019-11-01   |  nan |      nan |  996.6  |
|  7 | Buy    | BYND     |       13 | 83.24   | 11:07:49 | 2019-11-01   |  nan |      nan | 1082.12 |

Access specific key

updated_df['2019-11-01']['BYND']

|    | side   | symbol   |   shares |   price | time     | trade_date   |   pl |   trades |   value |
|---:|:-------|:---------|---------:|--------:|:---------|:-------------|-----:|---------:|--------:|
|  5 | Shrt   | BYND     |       25 | 83.3027 | 11:06:15 | 2019-11-01   |  nan |      nan | 2082.57 |
|  6 | Buy    | BYND     |       12 | 83.05   | 11:06:43 | 2019-11-01   |  nan |      nan |  996.6  |
|  7 | Buy    | BYND     |       13 | 83.24   | 11:07:49 | 2019-11-01   |  nan |      nan | 1082.12 |

Upvotes: 2

Related Questions