Reputation: 17
I have a weblog about 1 million rows ,and I want extract some Date,Time and Status to form a new data.frame.
V1
2013-08-27 16:00:01 117.79.149.2 GET 200 0 0
2013-08-27 16:00:02 117.79.149.2 GET 404 0 0
2013-08-27 16:00:03 117.79.149.2 GET 200 0 0
2013-08-27 16:00:04 117.79.149.2 GET 404 0 0
to become
Date_Time Status
2013-08-27 16:00:01 200
2013-08-27 16:00:02 404
2013-08-27 16:00:03 200
2013-08-27 16:00:04 404
I know how to extract the elements I need by following code
temp<-unlist(strsplit(x," "))
Date_Time<-paste(temp[1],temp[2])
Status<-temp[5]
But I didn't know how to execute it row by row to get a new data.frame without "for" loop, How can I use to sapply or lapply to fix it?
Upvotes: 0
Views: 91
Reputation: 238
mydf <- data.frame(V1=c("2013-08-27 16:00:01 117.79.149.2 GET 200 0 0",
"2013-08-27 16:00:02 117.79.149.2 GET 404 0 0",
"2013-08-27 16:00:03 117.79.149.2 GET 200 0 0",
"2013-08-27 16:00:04 117.79.149.2 GET 404 0 0"))
# With fixed width fields
mydf[, c("Date_Time", "Status")] <- list(substring(mydf$V1, 1, 19),
substring(mydf$V1, 38, 40))
# or based on the delimiter " " which is closer from your trial ...
strings <- unlist(strsplit(as.character(mydf$V1), " "))
mydf[, c("Date_Time", "Status")] <- list(paste(strings[seq(1, length.out=nrow(mydf), by=7)], strings[seq(2, length.out=nrow(mydf), by=7)]),
strings[seq(5, length.out=nrow(mydf), by=7)])
Upvotes: 0
Reputation: 7794
You can use sapply
:
example <- c("asdf asdwer dsf cswe asd","asfdw ewr cswe sdf wers")
split.example <- strsplit(example," ")
example.2 <- sapply(split.example,"[[",2)
This gives:
> example.2
[1] "asdwer" "ewr"
Just to make this a complete answer, using dat
provided by @Sven:
temp <- strsplit(as.character(dat$V1)," ")
new.df <- data.frame(Date_Time = paste(sapply(temp,"[[",1),
sapply(temp,"[[",2)),
Status = sapply(temp,"[[",5))
> new.df
Date_Time Status
1 2013-08-27 16:00:01 200
2 2013-08-27 16:00:02 404
3 2013-08-27 16:00:03 200
4 2013-08-27 16:00:04 404
Upvotes: 0
Reputation: 81733
A solution based on regular expressions:
with(dat, data.frame(Date_Time = gsub("(.*\\:[0-9]+) .*", "\\1", V1),
Status = gsub(".*T ([0-9]+) .*", "\\1", V1)))
# Date_Time Status
# 1 2013-08-27 16:00:01 200
# 2 2013-08-27 16:00:02 404
# 3 2013-08-27 16:00:03 200
# 4 2013-08-27 16:00:04 404
where dat
is your data frame:
dat <- data.frame(V1 = readLines(
textConnection("2013-08-27 16:00:01 117.79.149.2 GET 200 0 0
2013-08-27 16:00:02 117.79.149.2 GET 404 0 0
2013-08-27 16:00:03 117.79.149.2 GET 200 0 0
2013-08-27 16:00:04 117.79.149.2 GET 404 0 0")))
Upvotes: 3