Reputation: 3018
I have a Dataframe that looks like below
data = [(datetime.datetime(2021, 2, 10, 7, 49, 7, 118658), u'12.100.90.10', u'100.100.12.1', u'LT_DOWN'),
(datetime.datetime(2021, 2, 10, 7, 49, 14, 312273), u'12.100.90.10', u'100.100.12.1', u'LT_UP'),
(datetime.datetime(2021, 2, 10, 7, 49, 21, 535932), u'12.100.90.10', u'100.100.12.1', u'LT_UP'),
(datetime.datetime(2021, 2, 10, 7, 50, 28, 725961), u'12.100.90.10', u'100.100.12.1', u'PL_DOWN'),
(datetime.datetime(2021, 2, 10, 7, 50, 32, 450853), u'10.100.80.10', u'10.55.10.1', u'PL_LOW'),
(datetime.datetime(2021, 2, 10, 7, 51, 32, 450853), u'10.10.80.10', u'10.55.10.1', u'MA_HIGH'),
(datetime.datetime(2021, 2, 10, 7, 52, 34, 264042), u'10.10.80.10', u'10.55.10.1', u'PL_DOWN')]
As you can see there is data getting logged per minute. I have just presented part of the complete data here.
This is how it looks on loading it in pandas
date start end type
0 2021-02-10 07:49:07.118658 12.100.90.10 100.100.12.1 LT_DOWN
1 2021-02-10 07:49:14.312273 12.100.90.10 100.100.12.1 LT_UP
2 2021-02-10 07:49:21.535932 12.100.90.10 100.100.12.1 LT_UP
3 2021-02-10 07:50:28.725961 12.100.90.10 100.100.12.1 PL_DOWN
4 2021-02-10 07:50:32.450853 10.100.80.10 10.55.10.1 PL_LOW
5 2021-02-10 07:51:32.450853 10.10.80.10 10.55.10.1 MA_HIGH
6 2021-02-10 07:52:34.264042 10.10.80.10 10.55.10.1 PL_DOWN
Now what I want is two things
First, get count of each value in type
per minute and furthermore only first part of _
split in value should be considered for count. So my final output should be something like this
date LT PL MA
0 2021-02-10 07:49 3 0 0
1 2021-02-10 07:50 0 2 0
2 2021-02-10 07:51 0 0 1
3 2021-02-10 07:52 0 1 0
Second convert any non zero to True
and zero to False
. So final output would look like
date LT PL MA
0 2021-02-10 07:49 True False False
1 2021-02-10 07:50 False True False
2 2021-02-10 07:51 False False True
3 2021-02-10 07:52 False True False
Now regarding the first part, I can find the count of each value (split on first part of _
) on the total Dataframe by doing something like this
df = pd.DataFrame(data)
df.columns = ["date", "start", "end", "type"]
df = df.drop_duplicates()
issue = "LT"
df_check = df[df['type'].str.contains(issue)].groupby(['start', 'end']).size().to_frame('icount').reset_index()
But how do I do it on a minute basis and map it to True
and False
?
Upvotes: 1
Views: 172
Reputation: 15872
In the data you provided, last item in type
is MA_DOWN
, but in the dataframe and the results it is PL_DOWN
, I think it was an error on your part. I have considered the dataframe version here.
You can use pandas.DataFrame.resample
with pandas.get_dummies
:
>>> pd.get_dummies(df['type'].str.extract(r'(\w+)_', expand=False)
).set_index(df['date']).resample('1T').sum()
LT MA PL
date
2021-02-10 07:49:00 3 0 0
2021-02-10 07:50:00 0 0 2
2021-02-10 07:51:00 0 1 0
2021-02-10 07:52:00 0 0 1
Then astype(bool)
:
>>> pd.get_dummies(df['type'].str.extract(r'(\w+)_', expand=False)
).set_index(df['date']).resample('1T').sum().astype(bool)
LT MA PL
date
2021-02-10 07:49:00 True False False
2021-02-10 07:50:00 False False True
2021-02-10 07:51:00 False True False
2021-02-10 07:52:00 False False True
According to your comments, you can use pd.crosstab
:
>>> pd.crosstab(
index=df['date'].dt.floor('1min'),
columns=[
df['start'].add('-').add(df['end']).rename('star-end'),
df['type'].str.extract(r'(\w+)_', expand=False)
],
dropna=False
)
start-end 10.10.80.10-10.55.10.1 10.100.80.10-10.55.10.1 12.100.90.10-100.100.12.1
type LT MA PL LT MA PL LT MA PL
date
2021-02-10 07:49:00 0 0 0 0 0 0 3 0 0
2021-02-10 07:50:00 0 0 0 0 0 1 0 0 1
2021-02-10 07:51:00 0 1 0 0 0 0 0 0 0
2021-02-10 07:52:00 0 0 1 0 0 0 0 0 0
Or,
>>> pd.crosstab(
index=df['date'].dt.floor('1min'),
columns=[
df['start'],
df['end'],
df['type'].str.extract(r'(\w+)_', expand=False)
],
dropna=False
)
start 10.10.80.10 10.100.80.10 12.100.90.10
end 10.55.10.1 100.100.12.1 10.55.10.1 100.100.12.1 10.55.10.1 100.100.12.1
type LT MA PL LT MA PL LT MA PL LT MA PL LT MA PL LT MA PL
date
2021-02-10 07:49:00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0
2021-02-10 07:50:00 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1
2021-02-10 07:51:00 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2021-02-10 07:52:00 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Upvotes: 1