Darth Ratus
Darth Ratus

Reputation: 49

Removing some special characters when reading text file with read.table in R

I have been saving the output of the Android command top into a text file using Python (Bare with me as this is an R inquiry).

Unfortunately, I was missing a parameter that would just save the output as parseable without any ASCII escape commands (See https://unix.stackexchange.com/questions/409053/how-to-disable-color-in-output-of-top-command). I was able to fix that problem using the suggestion in the link.

Since I discovered the fix recently, I have quite a few older non-processed files with these extra characters (see sample output below).

Notepad Output Sample Notepad Output

Notepad++ Output Sample Notepad++ Output

Copying part of the output from the actual text file:

[s[999C[999B[6n[u[H[J[?25l[H[J[s[999C[999B[6n[uTasks: 279 total, 5 running, 274 sleeping, 0 stopped, 0 zombie

Mem: 1702176K total, 1661708K used, 41439232 free, 11583488 buffers

Swap: 425540K total, 345512K used, 81948672 free, 487176K cached

400%cpu 138%user 3%nice 235%sys 5%idle 0%iow 0%irq 19%sirq 0%host

[7m PID USER PR NI VIRT RES SHR S[%CPU] %MEM TIME+ ARGS [0m

I have an R program that reads these text files and does some post processing which works fairly well MOST of the time. Other instances these extra characters in a given file require manual cleanup.

Is there a way that I could modify my read.table command to tell R to ignore these wherever they are found ?

My code to read the text file and store the output into a CSV file is below. There is additional post-processing after that, which is really not relevant to the ask:

for (file in data_files)
    {
        i = i + 1

        ncol <- max(count.fields(paste(folder,file,sep="/"), sep = ""))
        Top_Data_Frame <- read.table(paste(folder,file,sep="/"), header = FALSE, fill=TRUE, col.names=paste0('V', seq_len(ncol)))
        write.csv(Top_Data_Frame, file = paste(folder,(paste((paste("The_Whole_File", i, sep="")),".csv", sep="")), sep="/"), row.names=FALSE)

Suggestions are appreciated.

Upvotes: 0

Views: 418

Answers (1)

Mark
Mark

Reputation: 12558

Try this:

example <- "[s[999C[999B[6n[u[H[J[?25l[H[J[s[999C[999B[6n[uTasks: 279 total, 5 running, 274 sleeping, 0 stopped, 0 zombie

Mem: 1702176K total, 1661708K used, 41439232 free, 11583488 buffers

Swap: 425540K total, 345512K used, 81948672 free, 487176K cached

400%cpu 138%user 3%nice 235%sys 5%idle 0%iow 0%irq 19%sirq 0%host

[7m PID USER PR NI VIRT RES SHR S[%CPU] %MEM TIME+ ARGS [0m"

example <- str_remove_all(data, "(?ms)^(.*(?=Tasks))|([\\[a-zA-Z0-9]+ ])|[^[:ascii:]]")

str_match(example, "(?s)Tasks: (\\d+) total, (\\d+) running, (\\d+) sleeping, (\\d+) stopped, (\\d+) zombie\n\nMem: ([0-9K]+) total, ([0-9K]+) used, ([0-9K]+) free, ([0-9K]+) buffers\n\nSwap: ([0-9K]+) total, ([0-9K]+) used, ([0-9K]+) free, ([0-9K]+) cached\n\n([0-9%]+)cpu ([0-9%]+)user ([0-9%]+)nice ([0-9%]+)sys ([0-9%]+)idle ([0-9%]+)iow ([0-9%]+)irq ([0-9%]+)sirq ([0-9%]+)host\n\n\\[([0-9a-z]+) PID USER PR NI VIRT RES SHR S\\[%CPU\\] %MEM TIME\\+ ARGS \\[([0-9a-z]+)") %>% 
  as.data.frame() %>%
  select(-1) %>%
  setNames(c("Tasks_total", "Tasks_running", "Tasks_sleeping", "Tasks_stopped", "Tasks_zombie", "Mem_total", "Mem_used", "Mem_free", "Mem_buffers", "Swap_total", "Swap_used", "Swap_free", "Swap_cached", "cpu", "user", "nice", "sys", "idle", "iow", "irq", "sirq", "host", "PID", "CPU"))

Upvotes: 0

Related Questions