Huynh
Huynh

Reputation: 392

unable to use dataframe.loc to extract dataframe created by adding column by dataframe.serries

I have a dataframe as below from parsing json:

                Date          Close  
0      2019-04-01T02:00:00Z  999.7  
1      2019-04-01T02:01:00Z  999.6 
2      2019-04-01T02:02:00Z  999.5  
....
14811  2019-06-28T07:45:00Z  999.9  
[14812 rows x 3 columns]

I used the below code to split them into 2 columns called "Day" and "Time:

data = json.loads(data_source)
df=pd.DataFrame.from_dict(data)
date=df['Date']
list_date=[]
list_time=[]
for item in date:
    item=re.split('\b|[T,Z]',item)  
    date=item[0:1]  
    list_date.append(date)
    time=item[1:2]  
    list_time.append(time)
df2['Time']=pd.Series(list_time)
df2['Day']=pd.Series(list_date)
df2['Close']=df['Close']

If I print df2, I'll receive a result like this:

                Day        Time   Close  
0      [2019-04-01]  [02:00:00]  999.7  
1      [2019-04-01]  [02:01:00]  999.6 
2      [2019-04-01]  [02:02:00]  999.5  
....
14811  [2019-06-28]  [07:45:00]  999.9  
[14812 rows x 3 columns]

Now, I want to extract a subdataframe by:

data_sub=df2.loc[df2['Day']=='2019-06-28']

But the result is just:

Empty DataFrame
Columns: [Day, Time, Close]
Index: []

I do not know why I am unable to extract that df2 after adding more columns by adding lists to a dataframe.

And please explain for me when/why I receive data in form of [value] and value (without []) when executing print(dataframe). As above, print(original dataframe)--> receive Date in form of value without [] (for example: 2019-04-01T02:00:00Z)

but when print(df2) gives Day and Time data in form of [] (for example: [2019-04-01] [02:00:00]).

Upvotes: 3

Views: 155

Answers (2)

jezrael
jezrael

Reputation: 863166

Use to_datetime with Series.dt.tz_convert for datetimes and for times Series.dt.time, for dates is used Series.dt.floor for removed times for filtering:

df['Date'] = pd.to_datetime(df['Date']).dt.tz_convert(None)
df['Time'] = df['Date'].dt.time
#possible, but problem wit hfiltering
#df['Day'] = df['Date'].dt.date
df['Day'] = df['Date'].dt.floor('d')
print (df)
                 Date  Close      Time        Day
0 2019-04-01 02:00:00  999.7  02:00:00 2019-04-01
1 2019-04-01 02:01:00  999.6  02:01:00 2019-04-01
2 2019-04-01 02:02:00  999.5  02:02:00 2019-04-01

data_sub=df.loc[df['Day']=='2019-04-01']
print (data_sub)
                 Date  Close      Time        Day
0 2019-04-01 02:00:00  999.7  02:00:00 2019-04-01
1 2019-04-01 02:01:00  999.6  02:01:00 2019-04-01
2 2019-04-01 02:02:00  999.5  02:02:00 2019-04-01

In my opinion here is possible create DatetimeIndex:

df['Date'] = pd.to_datetime(df['Date']).dt.tz_convert(None)
df = df.set_index('Date')
print (df)
                     Close
Date                      
2019-04-01 02:00:00  999.7
2019-04-01 02:01:00  999.6
2019-04-01 02:02:00  999.5

So possible use indexing and if need filter by times use DataFrame.at_time:

print (df['2019-04-01'])
                     Close
Date                      
2019-04-01 02:00:00  999.7
2019-04-01 02:01:00  999.6
2019-04-01 02:02:00  999.5

print (df.at_time('02:00:00'))
                     Close
Date                      
2019-04-01 02:00:00  999.7

EDIT:

I think in your solution are created one item lists, so for select is necessary use indexing .str[0] for scalars from lists:

print(df2['Day'].str[0])

data_sub=df2.loc[df2['Day'].str[0]=='2019-06-28']

EDIT1 - Select by times in first solution:

from datetime import time
data_sub=df[df['Time']==time(2,0,0)]
print (data_sub)
                 Date  Close      Time
0 2019-04-01 02:00:00  999.7  02:00:00

t = time(2,0,0)
data_sub=df.query('Time == @t')
print (data_sub)
                 Date  Close      Time
0 2019-04-01 02:00:00  999.7  02:00:00

Upvotes: 3

tawab_shakeel
tawab_shakeel

Reputation: 3739

try using pd.to_datetime()

df = pd.DataFrame(data={"date":['2019-04-01T02:00:00Z','2019-04-01T02:01:00Z','2019-04-01T02:02:00Z'],
                       "close":[ 999.7,1234,111]})
df['date'] = pd.to_datetime(df['date'])

df['Time'] = df['date'].dt.time
df['Day'] = df['date'].dt.date.astype(str)

data_sub = df.loc[df['Day']=='2019-04-01']


             date               close      Time       Day
0   2019-04-01 02:00:00+00:00   999.7   02:00:00    2019-04-01
1   2019-04-01 02:01:00+00:00   1234.0  02:01:00    2019-04-01
2   2019-04-01 02:02:00+00:00   111.0   02:02:00    2019-04-01

Upvotes: 0

Related Questions