Reputation: 27
The textsample
below is in one column. Using R, I hope to separate it into 5 columns with the following headings: "Name" , "Location", "Date", "Time", "Warning" . I have tried separate()
and strsplit()
and haven't succeeded yet. I hope someone here can help.
textsample <- "Name : York-APC-UPS\r\n
Location : York SCATS Zigzag Road\r\n
Contact : Mechanical services\r\n
\r\n
http://York-APC-UPS.domain25.minortracks.wa.gov.au\r\n
http://192.168.70.56\r\n
http://FE81::3C0:B8FF:FE6D:8065\r\n
Serial Number : 5A1149T24253\r\n
Device Serial Number : 5A1149T24253\r\n
Date : 12/06/2018\r\n
Time : 08:45:46\r\n
Code : 0x0125\r\n
\r\n
Warning : A high humidity threshold violation exists for integrated Environmental Monitor TH Sensor
(Port 1 Temp 1 at Port 1) reporting over 50%CD.\r\n"
Upvotes: 2
Views: 263
Reputation: 33643
Base R (assuming values stay on same line):
extract_value <- function(text, key) {
regmatches(text, gregexpr(sprintf("(?<=%s : )[^\\r]+", key), text, perl = TRUE))
}
cols <- c("Name" , "Location", "Date", "Time", "Warning")
as.data.frame(sapply(cols, \(x) extract_value(textsample, x)))
# Name Location Date Time
# 1 York-APC-UPS York SCATS Zigzag Road 12/06/2018 08:45:46
# Warning
# 1 A high humidity threshold violation exists for integrated Environmental Monitor TH Sensor \n (Port 1 Temp 1 at Port 1) reporting over 50%CD.
Upvotes: 1
Reputation: 2132
I hope you have resolved your problem by now.
Anyway, here's an alternative take based on regex separator instead of declaring all columns. Bit cleaning and formatting.
library(tidyverse)
library(hms)
# ----------------------
textsample <- textsample %>%
str_split_1("\r+\n+") %>%
as_tibble_col("text") %>%
transmute(
label = str_extract(text, "^.*?(?=\\s:\\s)"),
text = str_remove(text, "^.*?\\s:\\s"),
across(where(is.character), \(x) str_squish(x)),
label = str_replace_all(str_to_lower(label), "\\s", "_")) %>%
filter(!is.na(label)) %>%
pivot_wider(names_from = label, values_from = text) %>%
mutate(date = dmy(date), time = hms::as_hms(time))
The output:
> textsample
# A tibble: 1 × 9
name location contact serial_number device_serial_number date time code warning
<chr> <chr> <chr> <chr> <chr> <date> <time> <chr> <chr>
1 York-APC-UPS York SCATS Zigzag Road Mechanical services 5A1149T24253 5A1149T24253 2018-06-12 08:45:46 0x0125 A high humidity threshold violation exists for integrated Environmental Monitor TH Sensor (Port 1 Temp 1 a…
Upvotes: 0
Reputation: 24878
Here's an approach that should at least get you started:
We can use extract
from tidyr
extract the text of interest with regular expressions.
Then we can use mutate_all
to apply the same str_replace
to get rid of the labels.
library(dplyr)
library(tidyr)
library(stringr)
as.data.frame(extsample) %>%
extract(1, into=c("Name","Location","Date","Time","Warning"),
regex = "(Name : .+)[^$]*(Location : .+)[^$]*(Date : .+)[^$]*(Time : .+)[^$]*(Warning : .+)[^$]*") %>%
mutate_all(list(~str_replace(.,"^\\w+ : ","")))
# Name Location Date Time
#1 York-APC-UPS York SCATS Zigzag Road 12/06/2018 08:45:46
# Warning
#1 A high humidity threshold violation exists for integrated Environmental Monitor TH Sensor
This relies on capturing groups with ()
, see help(tidyr::extract)
for details. We use [^$]*
to match anything other than the end of the string 0 or more times between the groups.
Note the first argument to extract
is 1
, which indicates the first (and only) column of the data.frame
I made from your example data. Change this as necessary.
Upvotes: 1