Reputation: 429
Good day! There is the following time series dataset:
Time Value
1 1
2 1
3 1
4 2
5 2
6 2
7 2
8 3
9 3
10 4
11 4
12 5
I need to split and group data by value like this:
Value Time start, Time end
1 1 3
2 4 7
3 8 9
4 10 11
5 12 12
How to do it fast and in the most functional programming style on python? Various libraries can be used for example pandas, numpy.
Upvotes: 1
Views: 70
Reputation: 155
first get the count of Values
result = df.groupby('Value').agg(['count'])
result.columns = result.columns.get_level_values(1) #drop multi-index
result
count
Value
1 3
2 4
3 2
4 2
5 1
then cumcount to get time start
s = df.groupby('Value').cumcount()
result["time start"] = s[s == 0].index.tolist()
result
count time start
Value
1 3 0
2 4 3
3 2 7
4 2 9
5 1 11
finally,
result["time start"] += 1
result["time end"] = result["time start"] + result['count'] - 1
result
count time start time end
Value
1 3 1 3
2 4 4 7
3 2 8 9
4 2 10 11
5 1 12 12
Upvotes: 0
Reputation: 21
We can use pandas for this:
Solution:
data = {'Time': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
'Value': [1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 5]
}
df = pd.DataFrame(data, columns= ['Time', 'Value'])
res = df.groupby('Value').agg(['min', 'max'])
f_res = res.rename(columns = {'min': 'Start Time', 'max': 'End Time'}, inplace = False)
print(f_res)
Output:
Time
Start Time End Time
Value
1 1 3
2 4 7
3 8 9
4 10 11
5 12 12
Upvotes: 0
Reputation: 150735
Try with pandas:
df.groupby('Time')['Value'].agg(['min','max'])
Upvotes: 1