Reputation: 129
I have a firewall log file that includes date, hour, src_address, dest_address and Date.time. I want to create a 24 hour vector with 60 and 1 minutes time interval for everytime(ex;from 2018/01/01 to 2018/05/06) .Then in these intervals I want to find appearence of pair of src_address and dest_address.Lastly max of these appearence for every pair of src_address and dest_address.Here is my file;
date hour src_address dest_address Date.Time
1996 2018-04-14 08:24:01 1.11.201.19 172.16.16.100 2018-04-14 08:24:01
3702 2018-04-15 12:10:27 1.119.43.90 172.16.16.100 2018-04-15 12:10:27
1154 2018-04-14 00:59:27 1.119.43.90 172.16.16.153 2018-04-14 00:59:27
2414 2018-04-14 12:33:29 1.119.43.90 192.168.1.112 2018-04-14 12:33:29
18013 2018-04-28 18:49:05 1.171.43.133 172.16.16.5 2018-04-28 18:49:05
18015 2018-04-28 18:49:05 1.171.43.133 172.16.16.5 2018-04-28 18:49:05
6903 2018-04-25 21:31:52 1.179.191.82 172.16.16.5 2018-04-25 21:31:52
11741 2018-04-27 01:08:43 1.179.191.82 192.168.1.111 2018-04-27 01:08:43
11933 2018-04-27 02:00:10 1.179.191.82 192.168.1.111 2018-04-27 02:00:10
11023 2018-04-26 21:39:39 1.179.191.82 192.168.1.112 2018-04-26 21:39:39
11175 2018-04-26 22:31:01 1.179.191.82 192.168.1.112 2018-04-26 22:31:01
13073 2018-04-27 08:24:58 1.180.72.186 172.16.16.153 2018-04-27 08:24:58
13735 2018-04-27 12:07:34 1.180.72.186 172.16.16.153 2018-04-27 12:07:34
2752 2018-04-14 19:34:53 1.202.165.40 172.16.16.153 2018-04-14 19:34:53
4046 2018-04-15 18:16:40 1.203.84.52 172.16.16.5 2018-04-15 18:16:40
4048 2018-04-15 18:18:43 1.203.84.52 192.168.1.112 2018-04-15 18:18:43
3020 2018-04-15 01:35:40 1.209.171.4 192.168.1.111 2018-04-15 01:35:40
4870 2018-04-16 05:33:42 1.214.34.114 172.16.16.100 2018-04-16 05:33:42
7025 2018-04-25 22:28:06 1.214.34.114 172.16.16.100 2018-04-25 22:28:06
4262 2018-04-15 23:31:56 1.214.34.114 172.16.16.153 2018-04-15 23:31:56
9369 2018-04-26 10:32:50 1.214.34.114 172.16.16.153 2018-04-26 10:32:50
2716 2018-04-14 18:49:30 1.214.34.114 172.16.16.5 2018-04-14 18:49:30
9563 2018-04-26 12:34:58 1.214.34.114 172.16.16.5 2018-04-26 12:34:58
1110 2018-04-14 00:27:02 1.214.34.114 192.168.1.111 2018-04-14 00:27:02
4470 2018-04-16 01:27:32 1.214.34.114 192.168.1.112 2018-04-16 01:27:32
9581 2018-04-26 12:55:39 1.55.249.92 172.16.16.153 2018-04-26 12:55:39
2970 2018-04-15 00:01:18 1.55.249.92 172.16.16.5 2018-04-15 00:01:18
15329 2018-04-27 21:53:16 1.55.249.92 172.16.16.5 2018-04-27 21:53:16
15537 2018-04-28 00:02:30 1.55.249.92 172.16.16.5 2018-04-28 00:02:30
19249 2018-04-29 06:28:04 1.71.188.254 172.16.16.100 2018-04-29 06:28:04
19243 2018-04-29 06:28:04 1.71.188.254 172.16.16.153 2018-04-29 06:28:04
19241 2018-04-29 06:28:04 1.71.188.254 172.16.16.159 2018-04-29 06:28:04
19239 2018-04-29 06:28:04 1.71.188.254 172.16.16.5 2018-04-29 06:28:04
19247 2018-04-29 06:28:04 1.71.188.254 192.168.1.111 2018-04-29 06:28:04
19245 2018-04-29 06:28:04 1.71.188.254 192.168.1.112 2018-04-29 06:28:04
6315 2018-04-25 18:56:08 1.85.18.88 172.16.16.153 2018-04-25 18:56:08
14623 2018-04-27 16:41:00 1.85.18.88 172.16.16.153 2018-04-27 16:41:00
And here is my expectation;
src_address dest_address max(per hour) max(per minute)
2 1.11.201.19 172.16.16.100 1 1
3 1.119.43.90 172.16.16.100 1 1
4 1.119.43.90 172.16.16.153 1 1
5 1.119.43.90 192.168.1.112 1 1
6 1.171.43.133 172.16.16.5 2 2
Upvotes: 1
Views: 199
Reputation: 20095
Quite a few things have to be done to get the summaries data. One can use dplyr
, tidyr
and lubridate
packages to transform data.
The approach:
- Create
DateTime
column by uniting date and hour and converting toymd_hms
- Group on
src_addres
,dest_address
, andYear-Month-Day Hour
to calculate hourly occurrence- Group on
src_addres
,dest_address
, andYear-Month-Day Hour:Min
to calculate > per min occurrence- Group on
src_addres
,dest_address
and summarize to get max of hourly and per min occurrence
library(dplyr)
library(tidyr)
library(lubridate)
df %>% unite("DateTime", c("date","hour"), sep=" ") %>%
mutate(DateTime = ymd_hms(DateTime)) %>%
group_by(src_addres, dest_address, YMD_H = format(DateTime, "%Y-%m-%d %H")) %>%
mutate(HourlyAppearance = n()) %>%
group_by(src_addres, dest_address, YMD_HM = format(DateTime, "%Y-%m-%d %H:%M")) %>%
mutate(PerMinAppearance = n()) %>%
group_by(src_addres, dest_address) %>%
summarise( 'max(per hour)' = max(HourlyAppearance),
'max(per min)' = max(PerMinAppearance)) %>%
as.data.frame()
# src_addres dest_address max(per hour) max(per min)
# 1 1.11.201.19 172.16.16.100 1 1
# 2 1.119.43.90 172.16.16.100 1 1
# 3 1.119.43.90 172.16.16.153 1 1
# 4 1.119.43.90 192.168.1.112 1 1
# 5 1.171.43.133 172.16.16.5 2 2
# 6 1.179.191.82 172.16.16.5 1 1
# 7 1.179.191.82 192.168.1.111 1 1
# 8 1.179.191.82 192.168.1.112 1 1
# 9 1.180.72.186 172.16.16.153 1 1
# 10 1.202.165.40 172.16.16.153 1 1
# 11 1.203.84.52 172.16.16.5 1 1
# 12 1.203.84.52 192.168.1.112 1 1
# 13 1.209.171.4 192.168.1.111 1 1
# 14 1.214.34.114 172.16.16.100 1 1
# 15 1.214.34.114 172.16.16.153 1 1
# 16 1.214.34.114 172.16.16.5 1 1
# 17 1.214.34.114 192.168.1.111 1 1
# 18 1.214.34.114 192.168.1.112 1 1
# 19 1.55.249.92 172.16.16.153 1 1
# 20 1.55.249.92 172.16.16.5 1 1
# 21 1.71.188.254 172.16.16.100 1 1
# 22 1.71.188.254 172.16.16.153 1 1
# 23 1.71.188.254 172.16.16.159 1 1
# 24 1.71.188.254 172.16.16.5 1 1
# 25 1.71.188.254 192.168.1.111 1 1
# 26 1.71.188.254 192.168.1.112 1 1
# 27 1.85.18.88 172.16.16.153 1 1
Data:
OP hasn't provided data in a pretty simple format. The inclusion of date and time columns has made it more difficult. Perhaps that be reason for low response to this question. I preferred to read date
and time
part separately and then unite
those to get Date/Time
.
strtext <- "Sl date hour src_addres dest_address Date_t Time_t
1996 2018-04-14 08:24:01 1.11.201.19 172.16.16.100 2018-04-14 08:24:01
3702 2018-04-15 12:10:27 1.119.43.90 172.16.16.100 2018-04-15 12:10:27
1154 2018-04-14 00:59:27 1.119.43.90 172.16.16.153 2018-04-14 00:59:27
2414 2018-04-14 12:33:29 1.119.43.90 192.168.1.112 2018-04-14 12:33:29
18013 2018-04-28 18:49:05 1.171.43.133 172.16.16.5 2018-04-28 18:49:05
18015 2018-04-28 18:49:05 1.171.43.133 172.16.16.5 2018-04-28 18:49:05
6903 2018-04-25 21:31:52 1.179.191.82 172.16.16.5 2018-04-25 21:31:52
11741 2018-04-27 01:08:43 1.179.191.82 192.168.1.111 2018-04-27 01:08:43
11933 2018-04-27 02:00:10 1.179.191.82 192.168.1.111 2018-04-27 02:00:10
11023 2018-04-26 21:39:39 1.179.191.82 192.168.1.112 2018-04-26 21:39:39
11175 2018-04-26 22:31:01 1.179.191.82 192.168.1.112 2018-04-26 22:31:01
13073 2018-04-27 08:24:58 1.180.72.186 172.16.16.153 2018-04-27 08:24:58
13735 2018-04-27 12:07:34 1.180.72.186 172.16.16.153 2018-04-27 12:07:34
2752 2018-04-14 19:34:53 1.202.165.40 172.16.16.153 2018-04-14 19:34:53
4046 2018-04-15 18:16:40 1.203.84.52 172.16.16.5 2018-04-15 18:16:40
4048 2018-04-15 18:18:43 1.203.84.52 192.168.1.112 2018-04-15 18:18:43
3020 2018-04-15 01:35:40 1.209.171.4 192.168.1.111 2018-04-15 01:35:40
4870 2018-04-16 05:33:42 1.214.34.114 172.16.16.100 2018-04-16 05:33:42
7025 2018-04-25 22:28:06 1.214.34.114 172.16.16.100 2018-04-25 22:28:06
4262 2018-04-15 23:31:56 1.214.34.114 172.16.16.153 2018-04-15 23:31:56
9369 2018-04-26 10:32:50 1.214.34.114 172.16.16.153 2018-04-26 10:32:50
2716 2018-04-14 18:49:30 1.214.34.114 172.16.16.5 2018-04-14 18:49:30
9563 2018-04-26 12:34:58 1.214.34.114 172.16.16.5 2018-04-26 12:34:58
1110 2018-04-14 00:27:02 1.214.34.114 192.168.1.111 2018-04-14 00:27:02
4470 2018-04-16 01:27:32 1.214.34.114 192.168.1.112 2018-04-16 01:27:32
9581 2018-04-26 12:55:39 1.55.249.92 172.16.16.153 2018-04-26 12:55:39
2970 2018-04-15 00:01:18 1.55.249.92 172.16.16.5 2018-04-15 00:01:18
15329 2018-04-27 21:53:16 1.55.249.92 172.16.16.5 2018-04-27 21:53:16
15537 2018-04-28 00:02:30 1.55.249.92 172.16.16.5 2018-04-28 00:02:30
19249 2018-04-29 06:28:04 1.71.188.254 172.16.16.100 2018-04-29 06:28:04
19243 2018-04-29 06:28:04 1.71.188.254 172.16.16.153 2018-04-29 06:28:04
19241 2018-04-29 06:28:04 1.71.188.254 172.16.16.159 2018-04-29 06:28:04
19239 2018-04-29 06:28:04 1.71.188.254 172.16.16.5 2018-04-29 06:28:04
19247 2018-04-29 06:28:04 1.71.188.254 192.168.1.111 2018-04-29 06:28:04
19245 2018-04-29 06:28:04 1.71.188.254 192.168.1.112 2018-04-29 06:28:04
6315 2018-04-25 18:56:08 1.85.18.88 172.16.16.153 2018-04-25 18:56:08
14623 2018-04-27 16:41:00 1.85.18.88 172.16.16.153 2018-04-27 16:41:00"
df <- read.table(text = strtext,header = TRUE, stringsAsFactors = FALSE)
Upvotes: 1