team
team

Reputation: 526

How to get time difference in specifc rows include in one column data using python

Here I have a dataset with time and three inputs. Here I calculate the time difference using panda.

code is :

data['Time_different'] = pd.to_timedelta(data['time'].astype(str)).diff(-1).dt.total_seconds().div(60)

This is reading the difference of time in each row. But I want to write a code for find the time difference only specific rows which are having X3 values.

I tried to write the code using for loop. But it's not working properly. Without using for loop can we write the code.?

enter image description here

As you can see in my image I have three inputs, X1,X2,X3. Here when I used that code it is showing the time difference of X1,X2,X3.

Here what I want to write is getting the time difference for X3 inputs which are having a values.

time      X3

6:00:00		0
7:00:00		2
8:00:00		0
9:00:00		50
10:00:00	0
11:00:00	0
12:00:00	0
13:45:00	0
15:00:00	0
16:00:00	0
17:00:00	0
18:00:00	0
19:00:00	20

Then here I want to skip the time of having 0 values of X3 and want to read only time difference of values of X3.

time             x3

7:00:00          2(values having)
9:00:00          50

So the time difference is 2hrs

Then second:

9:00:00          50
19:00:00         20

Then time difference is 10 hrs

Like wise I want write the code or my whole column. Can anyone help me to solve this?

While putting the code then get the error with time difference in minus value.

enter image description here

Upvotes: 2

Views: 96

Answers (1)

Alexandre B.
Alexandre B.

Reputation: 5500

You can try to:

  • Find rows where X3 different from 0
  • Compute the difference is hours using shift
  • Update the dataframe using join:

Full example:

data = """time      X3
6:00:00     0
7:00:00     2
8:00:00     0
9:00:00     50
10:00:00    0
11:00:00    0
12:00:00    0
13:45:00    0
15:00:00    0
16:00:00    0
17:00:00    0
18:00:00    0
19:00:00    20"""
# Build dataframe from example
df = pd.read_csv(StringIO(data), sep=r'\s{1,}')
df['X1'] = np.random.randint(0,10,len(df))   # Add random values for "X1" column
df['X2'] = np.random.randint(0,10,len(df))   # Add random values for "X2" column

# Convert the time column to datetime object
df.time = pd.to_datetime(df.time, format="%H:%M:%S")
print(df)
#                   time  X3  X1  X2
# 0  1900-01-01 06:00:00   0   5   4
# 1  1900-01-01 07:00:00   2   7   1
# 2  1900-01-01 08:00:00   0   2   8
# 3  1900-01-01 09:00:00  50   1   0
# 4  1900-01-01 10:00:00   0   3   9
# 5  1900-01-01 11:00:00   0   8   4
# 6  1900-01-01 12:00:00   0   0   2
# 7  1900-01-01 13:45:00   0   5   0
# 8  1900-01-01 15:00:00   0   5   7
# 9  1900-01-01 16:00:00   0   0   8
# 10 1900-01-01 17:00:00   0   6   7
# 11 1900-01-01 18:00:00   0   1   5
# 12 1900-01-01 19:00:00  20   4   7

# Compute difference
sub_df = df[df.X3 != 0]
out_values = (sub_df.time.dt.hour - sub_df.shift().time.dt.hour) \
            .to_frame() \
            .fillna(sub_df.time.dt.hour.iloc[0]) \
            .rename(columns={'time': 'out'})  # Rename column
print(out_values)
#      out
# 1    7.0
# 3    2.0
# 12  10.0

df = df.join(out_values)                # Add out values
print(df)
#                   time  X3  X1  X2   out
# 0  1900-01-01 06:00:00   0   2   9   NaN
# 1  1900-01-01 07:00:00   2   7   4   7.0
# 2  1900-01-01 08:00:00   0   6   6   NaN
# 3  1900-01-01 09:00:00  50   9   1   2.0
# 4  1900-01-01 10:00:00   0   2   9   NaN
# 5  1900-01-01 11:00:00   0   5   3   NaN
# 6  1900-01-01 12:00:00   0   6   4   NaN
# 7  1900-01-01 13:45:00   0   9   3   NaN
# 8  1900-01-01 15:00:00   0   3   0   NaN
# 9  1900-01-01 16:00:00   0   1   8   NaN
# 10 1900-01-01 17:00:00   0   7   5   NaN
# 11 1900-01-01 18:00:00   0   6   7   NaN
# 12 1900-01-01 19:00:00  20   1   5  10.0

Here is use .fillna(sub_df.time.dt.hour.iloc[0]) to replace the first values with the matching hours (since the subtract 0 does nothing). You can define your own rule for the value in fillna().

Upvotes: 2

Related Questions