team
team

Reputation: 526

How to read specific date and time for specific values in csv file using python

Here I have a dataset with date ,time and one inputs. Here I want to read date and time together for specific values. Here I want to keep the length of csv file as same without changing.

Here 5 values contain time convert 00:00:00.

Here I used some code but it gave me with 0 days. First I convert that specific time into 00:00:00

data['date_time']= pd.to_datetime(data['date'] + " " + data['time'],
                        format='%d/%m/%Y %H:%M:%S', dayfirst=True)

data['duration'] = np.where(data['X3'].eq(5), np.timedelta64(0), pd.to_timedelta(data['date_time']))


print(data['duration'])


def f(x):
ts = x.total_seconds()
hours, remainder = divmod(ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('{:02d}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds)) 

data['duration'] = data['duration'].apply(f)

output :

5      00:00:00
11     00:00:00
18     00:00:00
25     00:00:00
30     00:00:00
37     00:00:00
43     00:00:00
46     00:00:00
54     00:00:00
60     00:00:00
65     00:00:00
70     00:00:00
80     00:00:00
82     00:00:00
89     00:00:00
95     00:00:00
99     00:00:00
104    00:00:00
111    00:00:00
114    00:00:00
121    00:00:00

But what I expected output is:

    datetime	              x3
10/3/2018 6:15:00     	7
10/3/2018 00:00:00     	5
10/3/2018 7:45:00     	7
10/3/2018 9:00:00     	7
10/3/2018 9:25:00     	7
10/3/2018 00:00:00     	5
10/3/2018 11:00:00    	7
10/3/2018 11:30:00    	7
10/3/2018 13:30:00    	7
10/3/2018 00:00:00    	5
10/3/2018 15:00:00    	7
10/3/2018 15:25:00    	7
10/3/2018 16:25:00    	7
10/3/2018 00:00:00    	5
10/3/2018 19:00:00    	7
10/3/2018 19:30:00    	7

Means replace time with 00:00:00

date	time	x3	              T		                 x3
10/3/2018	6:15:00	7	10/3/2018 6:15:00		7
10/3/2018	6:45:00	5	10/3/2018 0:00:00		5
10/3/2018	7:45:00	7	10/3/2018 7:45:00		7
10/3/2018	9:00:00	7	10/3/2018 9:00:00	        7
10/3/2018	9:25:00	7	10/3/2018 9:25:00		7
10/3/2018	9:30:00	5	10/3/2018 0:00:00		5

Subset of my csv:

date	             time	x3
10/3/2018	6:15:00 	7
10/3/2018	6:45:00 	5
10/3/2018	7:45:00 	7
10/3/2018	9:00:00	        7 
10/3/2018	9:25:00   	7
10/3/2018	9:30:00	        5
10/3/2018	11:00:00	7
10/3/2018	11:30:00	7
10/3/2018	13:30:00	7
10/3/2018	13:50:00	5
10/3/2018	15:00:00	7
10/3/2018	15:25:00	7
10/3/2018	16:25:00	7
10/3/2018	18:00:00	5
10/3/2018	19:00:00	7
10/3/2018	19:30:00	7
10/3/2018	20:00:00	7
10/3/2018	22:05:00	7
10/3/2018	22:15:00	5
10/3/2018	23:40:00	7
10/4/2018	6:58:00	        5
10/4/2018	13:00:00	7
10/4/2018	16:00:00	7
10/4/2018	17:00:00	7
10/4/2018	18:00:00	7
10/5/2018	7:00:00	        7
10/5/2018	8:00:00  	7
10/5/2018	9:00:00	        7

My csv file

Upvotes: 1

Views: 150

Answers (2)

jezrael
jezrael

Reputation: 863741

You can remove times in date_time column only for rows with data['x3'].eq(5) with Series.mask and Series.dt.floor by days:

data['date_time']= pd.to_datetime(data['date'] + " " + data['time'],
                        format='%d/%m/%Y %H:%M:%S', dayfirst=True)

mask = data['x3'].eq(5)
data['date_time'] = data['date_time'].mask(mask, data['date_time'].dt.floor('d'))

print (data)
         date      time  x3           date_time
0   10/3/2018   6:15:00   7 2018-03-10 06:15:00
1   10/3/2018   6:45:00   5 2018-03-10 00:00:00
2   10/3/2018   7:45:00   7 2018-03-10 07:45:00
3   10/3/2018   9:00:00   7 2018-03-10 09:00:00
4   10/3/2018   9:25:00   7 2018-03-10 09:25:00
5   10/3/2018   9:30:00   5 2018-03-10 00:00:00
6   10/3/2018  11:00:00   7 2018-03-10 11:00:00
7   10/3/2018  11:30:00   7 2018-03-10 11:30:00
8   10/3/2018  13:30:00   7 2018-03-10 13:30:00
9   10/3/2018  13:50:00   5 2018-03-10 00:00:00
10  10/3/2018  15:00:00   7 2018-03-10 15:00:00
11  10/3/2018  15:25:00   7 2018-03-10 15:25:00
12  10/3/2018  16:25:00   7 2018-03-10 16:25:00
13  10/3/2018  18:00:00   5 2018-03-10 00:00:00
14  10/3/2018  19:00:00   7 2018-03-10 19:00:00
15  10/3/2018  19:30:00   7 2018-03-10 19:30:00
16  10/3/2018  20:00:00   7 2018-03-10 20:00:00
17  10/3/2018  22:05:00   7 2018-03-10 22:05:00
18  10/3/2018  22:15:00   5 2018-03-10 00:00:00
19  10/3/2018  23:40:00   7 2018-03-10 23:40:00
20  10/4/2018   6:58:00   5 2018-04-10 00:00:00
21  10/4/2018  13:00:00   7 2018-04-10 13:00:00
22  10/4/2018  16:00:00   7 2018-04-10 16:00:00
23  10/4/2018  17:00:00   7 2018-04-10 17:00:00
24  10/4/2018  18:00:00   7 2018-04-10 18:00:00
25  10/5/2018   7:00:00   7 2018-05-10 07:00:00
26  10/5/2018   8:00:00   7 2018-05-10 08:00:00
27  10/5/2018   9:00:00   7 2018-05-10 09:00:00

Upvotes: 1

moys
moys

Reputation: 8033

Remove duration from your code for T. it should be

T = data.loc[data['duration'] == match_time, ['date','duration','x3']]

When you put duration in there, what you are asking Pandas is to get the column duration when data['duration'] == match_time.

Upvotes: 0

Related Questions