Reputation: 526
Here I have a dataset with time and three inputs. Here I calculate the time difference using panda.
code is :
data['Time_different'] = pd.to_timedelta(data['time'].astype(str)).diff(-1).dt.total_seconds().div(60)
This is reading the difference of time in each row. But I want to write a code for find the time difference only specific rows which are having X3 values.
I tried to write the code using for loop. But it's not working properly. Without using for loop can we write the code.?
As you can see in my image I have three inputs, X1,X2,X3. Here when I used that code it is showing the time difference of X1,X2,X3.
Here what I want to write is getting the time difference for X3 inputs which are having a values.
time X3
6:00:00 0
7:00:00 2
8:00:00 0
9:00:00 50
10:00:00 0
11:00:00 0
12:00:00 0
13:45:00 0
15:00:00 0
16:00:00 0
17:00:00 0
18:00:00 0
19:00:00 20
Then here I want to skip the time of having 0 values of X3 and want to read only time difference of values of X3.
time x3
7:00:00 2(values having)
9:00:00 50
So the time difference is 2hrs
Then second:
9:00:00 50
19:00:00 20
Then time difference is 10 hrs
Like wise I want write the code or my whole column. Can anyone help me to solve this?
While putting the code then get the error with time difference in minus value.
Upvotes: 2
Views: 96
Reputation: 5500
You can try to:
X3
different from 0shift
join
:Full example:
data = """time X3
6:00:00 0
7:00:00 2
8:00:00 0
9:00:00 50
10:00:00 0
11:00:00 0
12:00:00 0
13:45:00 0
15:00:00 0
16:00:00 0
17:00:00 0
18:00:00 0
19:00:00 20"""
# Build dataframe from example
df = pd.read_csv(StringIO(data), sep=r'\s{1,}')
df['X1'] = np.random.randint(0,10,len(df)) # Add random values for "X1" column
df['X2'] = np.random.randint(0,10,len(df)) # Add random values for "X2" column
# Convert the time column to datetime object
df.time = pd.to_datetime(df.time, format="%H:%M:%S")
print(df)
# time X3 X1 X2
# 0 1900-01-01 06:00:00 0 5 4
# 1 1900-01-01 07:00:00 2 7 1
# 2 1900-01-01 08:00:00 0 2 8
# 3 1900-01-01 09:00:00 50 1 0
# 4 1900-01-01 10:00:00 0 3 9
# 5 1900-01-01 11:00:00 0 8 4
# 6 1900-01-01 12:00:00 0 0 2
# 7 1900-01-01 13:45:00 0 5 0
# 8 1900-01-01 15:00:00 0 5 7
# 9 1900-01-01 16:00:00 0 0 8
# 10 1900-01-01 17:00:00 0 6 7
# 11 1900-01-01 18:00:00 0 1 5
# 12 1900-01-01 19:00:00 20 4 7
# Compute difference
sub_df = df[df.X3 != 0]
out_values = (sub_df.time.dt.hour - sub_df.shift().time.dt.hour) \
.to_frame() \
.fillna(sub_df.time.dt.hour.iloc[0]) \
.rename(columns={'time': 'out'}) # Rename column
print(out_values)
# out
# 1 7.0
# 3 2.0
# 12 10.0
df = df.join(out_values) # Add out values
print(df)
# time X3 X1 X2 out
# 0 1900-01-01 06:00:00 0 2 9 NaN
# 1 1900-01-01 07:00:00 2 7 4 7.0
# 2 1900-01-01 08:00:00 0 6 6 NaN
# 3 1900-01-01 09:00:00 50 9 1 2.0
# 4 1900-01-01 10:00:00 0 2 9 NaN
# 5 1900-01-01 11:00:00 0 5 3 NaN
# 6 1900-01-01 12:00:00 0 6 4 NaN
# 7 1900-01-01 13:45:00 0 9 3 NaN
# 8 1900-01-01 15:00:00 0 3 0 NaN
# 9 1900-01-01 16:00:00 0 1 8 NaN
# 10 1900-01-01 17:00:00 0 7 5 NaN
# 11 1900-01-01 18:00:00 0 6 7 NaN
# 12 1900-01-01 19:00:00 20 1 5 10.0
Here is use .fillna(sub_df.time.dt.hour.iloc[0])
to replace the first values with the matching hours (since the subtract 0 does nothing). You can define your own rule for the value in fillna()
.
Upvotes: 2