Monica W
Monica W

Reputation: 27

How to get last value for a specific case in a dataframe

I have the following data from the df frames (result) (code found below):

          size   price                time           orig_time
0          4.0  9284.5 2019-11-03 02:47:00 2019-11-03 02:42:00
1          4.0  9284.5 2019-11-03 02:47:00 2019-11-03 02:42:00
2          4.0  9284.5 2019-11-03 02:47:00 2019-11-03 02:42:00
3          4.0  9284.5 2019-11-03 02:47:00 2019-11-03 02:42:00
4          4.0  9284.5 2019-11-03 02:47:00 2019-11-03 02:42:00
...        ...     ...                 ...                 ...
10444908   4.0  9323.5 2019-11-05 05:48:00 2019-11-05 05:43:00
10444909   4.0  9321.0 2019-11-05 05:48:00 2019-11-05 05:43:00
10444910   4.0  9323.5 2019-11-05 05:48:00 2019-11-05 05:43:00
10444911   4.0  9321.0 2019-11-05 05:48:00 2019-11-05 05:43:00
10444912   4.0  9323.5 2019-11-05 05:48:00 2019-11-05 05:43:00

I want to get the last price for each 'time' and put that data into new data frame. For example for the data above I just want to see:

          size   price                time           orig_time
4          4.0  9284.5 2019-11-03 02:47:00 2019-11-03 02:42:00
...        ...     ...                 ...                 ...
10444912   4.0  9323.5 2019-11-05 05:48:00 2019-11-05 05:43:00

The code I have for these outputs is the following:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import csv
headers = ['ticker', 'size', 'price', 'unix','type','time']
dtypes = {'ticker': 'str', 'size': 'float', 'price': 'float', 'unix': 'float','type': 'str','time': 'str'}
parse_dates = ['time']
btcnow = pd.read_csv('new 113-115.csv', header=None, names=headers, dtype=dtypes, parse_dates=parse_dates)
now3 = pd.DataFrame(btcnow, columns=['size','time','unix','price'])

from datetime import datetime, timedelta

time_interval = timedelta(minutes = 5)
df = now3[[ 'time', 'size', 'price']]

# extract time size for merge
df_time_size= df[['time', 'size']]
df_time_size.loc[:, 'time'] = df_time_size.loc[:, 'time'] + time_interval

# inner join dataframe by size&time
df = df_time_size.merge(df[['time', 'size', 'price']], how = 'inner')
df['orig_time'] = df['time'] - time_interval

df1= df.loc[df["size"] == 4, "price"]
df2= df.loc[df["size"] == 4, "time"]
df3= df.loc[df["size"] == 4, "size"]
df4=df.loc[df["size"] == 4, "orig_time"]
frames = [df3,df1,df2,df4]
result = pd.concat(frames, axis=1, sort=True)

I tried the following but gives me only one row for last price but I need all times 4 comes up but just the last price at that time (or the average at that time, whichever is possible):

df = df.groupby('size').apply(lambda x: x.sort_values('time', ascending=False).head(1)).reset_index(drop=True)

Upvotes: 0

Views: 58

Answers (1)

Chris
Chris

Reputation: 29742

Using pandas.DataFrame.groupby.last:

new_df = df.groupby('time').last().reset_index()
print(new_df)

Output:

                  time  size   price            orig_time
0  2019-11-03 02:47:00   4.0  9284.5  2019-11-03 02:42:00
1  2019-11-05 05:48:00   4.0  9323.5  2019-11-05 05:43:00

Upvotes: 2

Related Questions