Reputation: 27
I have the following data from the df frames (result) (code found below):
size price time orig_time
0 4.0 9284.5 2019-11-03 02:47:00 2019-11-03 02:42:00
1 4.0 9284.5 2019-11-03 02:47:00 2019-11-03 02:42:00
2 4.0 9284.5 2019-11-03 02:47:00 2019-11-03 02:42:00
3 4.0 9284.5 2019-11-03 02:47:00 2019-11-03 02:42:00
4 4.0 9284.5 2019-11-03 02:47:00 2019-11-03 02:42:00
... ... ... ... ...
10444908 4.0 9323.5 2019-11-05 05:48:00 2019-11-05 05:43:00
10444909 4.0 9321.0 2019-11-05 05:48:00 2019-11-05 05:43:00
10444910 4.0 9323.5 2019-11-05 05:48:00 2019-11-05 05:43:00
10444911 4.0 9321.0 2019-11-05 05:48:00 2019-11-05 05:43:00
10444912 4.0 9323.5 2019-11-05 05:48:00 2019-11-05 05:43:00
I want to get the last price for each 'time' and put that data into new data frame. For example for the data above I just want to see:
size price time orig_time
4 4.0 9284.5 2019-11-03 02:47:00 2019-11-03 02:42:00
... ... ... ... ...
10444912 4.0 9323.5 2019-11-05 05:48:00 2019-11-05 05:43:00
The code I have for these outputs is the following:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import csv
headers = ['ticker', 'size', 'price', 'unix','type','time']
dtypes = {'ticker': 'str', 'size': 'float', 'price': 'float', 'unix': 'float','type': 'str','time': 'str'}
parse_dates = ['time']
btcnow = pd.read_csv('new 113-115.csv', header=None, names=headers, dtype=dtypes, parse_dates=parse_dates)
now3 = pd.DataFrame(btcnow, columns=['size','time','unix','price'])
from datetime import datetime, timedelta
time_interval = timedelta(minutes = 5)
df = now3[[ 'time', 'size', 'price']]
# extract time size for merge
df_time_size= df[['time', 'size']]
df_time_size.loc[:, 'time'] = df_time_size.loc[:, 'time'] + time_interval
# inner join dataframe by size&time
df = df_time_size.merge(df[['time', 'size', 'price']], how = 'inner')
df['orig_time'] = df['time'] - time_interval
df1= df.loc[df["size"] == 4, "price"]
df2= df.loc[df["size"] == 4, "time"]
df3= df.loc[df["size"] == 4, "size"]
df4=df.loc[df["size"] == 4, "orig_time"]
frames = [df3,df1,df2,df4]
result = pd.concat(frames, axis=1, sort=True)
I tried the following but gives me only one row for last price but I need all times 4 comes up but just the last price at that time (or the average at that time, whichever is possible):
df = df.groupby('size').apply(lambda x: x.sort_values('time', ascending=False).head(1)).reset_index(drop=True)
Upvotes: 0
Views: 58
Reputation: 29742
Using pandas.DataFrame.groupby.last
:
new_df = df.groupby('time').last().reset_index()
print(new_df)
Output:
time size price orig_time
0 2019-11-03 02:47:00 4.0 9284.5 2019-11-03 02:42:00
1 2019-11-05 05:48:00 4.0 9323.5 2019-11-05 05:43:00
Upvotes: 2