Reputation: 392
I have a dataframe as below from parsing json:
Date Close
0 2019-04-01T02:00:00Z 999.7
1 2019-04-01T02:01:00Z 999.6
2 2019-04-01T02:02:00Z 999.5
....
14811 2019-06-28T07:45:00Z 999.9
[14812 rows x 3 columns]
I used the below code to split them into 2 columns called "Day" and "Time:
data = json.loads(data_source)
df=pd.DataFrame.from_dict(data)
date=df['Date']
list_date=[]
list_time=[]
for item in date:
item=re.split('\b|[T,Z]',item)
date=item[0:1]
list_date.append(date)
time=item[1:2]
list_time.append(time)
df2['Time']=pd.Series(list_time)
df2['Day']=pd.Series(list_date)
df2['Close']=df['Close']
If I print df2, I'll receive a result like this:
Day Time Close
0 [2019-04-01] [02:00:00] 999.7
1 [2019-04-01] [02:01:00] 999.6
2 [2019-04-01] [02:02:00] 999.5
....
14811 [2019-06-28] [07:45:00] 999.9
[14812 rows x 3 columns]
Now, I want to extract a subdataframe by:
data_sub=df2.loc[df2['Day']=='2019-06-28']
But the result is just:
Empty DataFrame
Columns: [Day, Time, Close]
Index: []
I do not know why I am unable to extract that df2 after adding more columns by adding lists to a dataframe.
And please explain for me when/why I receive data in form of [value] and value (without []) when executing print(dataframe).
As above, print(original dataframe)
--> receive Date in form of value without [] (for example: 2019-04-01T02:00:00Z)
but when print(df2) gives Day and Time data in form of [] (for example: [2019-04-01] [02:00:00])
.
Upvotes: 3
Views: 155
Reputation: 863166
Use to_datetime
with Series.dt.tz_convert
for datetimes and for times Series.dt.time
, for date
s is used Series.dt.floor
for removed times for filtering:
df['Date'] = pd.to_datetime(df['Date']).dt.tz_convert(None)
df['Time'] = df['Date'].dt.time
#possible, but problem wit hfiltering
#df['Day'] = df['Date'].dt.date
df['Day'] = df['Date'].dt.floor('d')
print (df)
Date Close Time Day
0 2019-04-01 02:00:00 999.7 02:00:00 2019-04-01
1 2019-04-01 02:01:00 999.6 02:01:00 2019-04-01
2 2019-04-01 02:02:00 999.5 02:02:00 2019-04-01
data_sub=df.loc[df['Day']=='2019-04-01']
print (data_sub)
Date Close Time Day
0 2019-04-01 02:00:00 999.7 02:00:00 2019-04-01
1 2019-04-01 02:01:00 999.6 02:01:00 2019-04-01
2 2019-04-01 02:02:00 999.5 02:02:00 2019-04-01
In my opinion here is possible create DatetimeIndex
:
df['Date'] = pd.to_datetime(df['Date']).dt.tz_convert(None)
df = df.set_index('Date')
print (df)
Close
Date
2019-04-01 02:00:00 999.7
2019-04-01 02:01:00 999.6
2019-04-01 02:02:00 999.5
So possible use indexing and if need filter by times use DataFrame.at_time
:
print (df['2019-04-01'])
Close
Date
2019-04-01 02:00:00 999.7
2019-04-01 02:01:00 999.6
2019-04-01 02:02:00 999.5
print (df.at_time('02:00:00'))
Close
Date
2019-04-01 02:00:00 999.7
EDIT:
I think in your solution are created one item lists, so for select is necessary use indexing .str[0]
for scalars from lists:
print(df2['Day'].str[0])
data_sub=df2.loc[df2['Day'].str[0]=='2019-06-28']
EDIT1 - Select by times in first solution:
from datetime import time
data_sub=df[df['Time']==time(2,0,0)]
print (data_sub)
Date Close Time
0 2019-04-01 02:00:00 999.7 02:00:00
t = time(2,0,0)
data_sub=df.query('Time == @t')
print (data_sub)
Date Close Time
0 2019-04-01 02:00:00 999.7 02:00:00
Upvotes: 3
Reputation: 3739
try using pd.to_datetime()
df = pd.DataFrame(data={"date":['2019-04-01T02:00:00Z','2019-04-01T02:01:00Z','2019-04-01T02:02:00Z'],
"close":[ 999.7,1234,111]})
df['date'] = pd.to_datetime(df['date'])
df['Time'] = df['date'].dt.time
df['Day'] = df['date'].dt.date.astype(str)
data_sub = df.loc[df['Day']=='2019-04-01']
date close Time Day
0 2019-04-01 02:00:00+00:00 999.7 02:00:00 2019-04-01
1 2019-04-01 02:01:00+00:00 1234.0 02:01:00 2019-04-01
2 2019-04-01 02:02:00+00:00 111.0 02:02:00 2019-04-01
Upvotes: 0