Reputation: 3816
I am trying to find the fastest and most elegant solution to the situation described below.
I have a large DataFrame which contains the record of some machine states.
For this example let's assume we have only two machines: one
and two
(in reality I've got much more, but this shouldn't matter). Every machine can be in one of two states: up
and down
.
Here is how an example record can be created:
import pandas as pd
df = pd.DataFrame({'time' : range(8),
'machine' : ["one","one","two","two","one","two","two","one"],
'status' : ["up","down","up","down","up","up","down","down"]})
and how it looks like:
time machine status
0 one up
1 one down
2 two up
3 two down
4 one up
5 two up
6 two down
7 one down
The records are strictly ordered (in reality by timestamps).
Now, what I want to achieve is to find situations, where both machines were in a high state (up
), and preferably return a subset of the DataFrame, showing such situations. In the example above, a subset between time 5
and 6
is interesting for me, as both one
and two
machines were in up
state at that time.
The brute force solution would be probably to go through the data frame and store the state of all machines in a list, checking if we have two of them in an up
state during every iteration, but maybe there is a more elegant solution?
All suggestions are more than welcome.
Upvotes: 0
Views: 622
Reputation: 353059
To get started, how about something like
df2 = df.pivot(index="time", columns="machine", values="status")
df2 = df2.fillna(method='ffill')
both_up = df2[(df2 == "up").all(axis=1)]
which produces
>>> both_up
machine one two
time
5 up up
given your example.
First, build the initial DataFrame
:
>>> df = pd.DataFrame({'time' : range(8),
'machine' : ["one","one","two","two","one","two","two","one"],
'status' : ["up","down","up","down","up","up","down","down"]})
>>> df
machine status time
0 one up 0
1 one down 1
2 two up 2
3 two down 3
4 one up 4
5 two up 5
6 two down 6
7 one down 7
Then pivot:
>>> df2 = df.pivot(index="time", columns="machine", values="status")
>>> df2
machine one two
time
0 up NaN
1 down NaN
2 NaN up
3 NaN down
4 up NaN
5 NaN up
6 NaN down
7 down NaN
We don't know anything about the state of machine two before time 2 -- unless we assume that only transitions are recorded and not states, so it must have been down before -- but the other missing states we can guess assuming all the relevant changes are caught. IOW, we can forward-fill:
>>> df2 = df2.fillna(method='ffill')
>>> df2
machine one two
time
0 up NaN
1 down NaN
2 down up
3 down down
4 up down
5 up up
6 up down
7 down down
And then we can look for the places where they're both up:
>>> (df2 == "up").all(axis=1)
time
0 False
1 False
2 False
3 False
4 False
5 True
6 False
7 False
dtype: bool
>>> both_up = df2[(df2 == "up").all(axis=1)]
>>> both_up
machine one two
time
5 up up
With just a little more effort you could get [5,6)
or some duration measure out of it but hopefully the above will help you get started.
Upvotes: 3