Wimpel
Wimpel

Reputation: 27772

reading in inconsitent data from textfile

I'm trying to do the advent of code in R this year.

On the day5 puzzle, there is a weird input format that I had to import into R:

    [D]    
[N] [C]    
[Z] [M] [P]
 1   2   3 

Unfortunately I could not find a proper way to read this sample input, so I maually typed the stacked crates (this is what the puzzle is about) and send it to a list of characters using:

L <- sapply(list("ZN", "MCD", "P"), strsplit, "")
# [[1]]
# [1] "Z" "N"
# 
# [[2]]
# [1] "M" "C" "D"
# 
# [[3]]
# [1] "P"

While my solution for the puzzele worked, I remain with this feeling that I want to be able to read in the sample data automatically, in stead of typing it by hand. Any suggestions? I prefer data.table solutions, but all hints/tips/solutions are of course welcome.

Upvotes: 1

Views: 53

Answers (2)

Stefano Barbi
Stefano Barbi

Reputation: 3194

Two solutions: the first uses read.fwf exploiting the fact that the text is organized in fixed width fields.

txt <- "    [D]    
[N] [C]    
[Z] [M] [P]
 1   2   3 "

read.fwf(textConnection(txt), widths = c(4,4,4),
         n = 3,
         header = FALSE) |>
    apply(2, gsub, pattern = "[^A-Z]",
          replacement =  "",
          simplify = FALSE) |>
    Map(f=\(x) rev(x[x != ""]))

##> $V1
##> [1] "Z" "N"
##> 
##> $V2
##> [1] "M" "C" "D"
##> 
##> $V3
##> [1] "P"

The second solution uses linux standard utils to preprocess an external file before piping it to read.delim


read.delim(pipe("tac input.txt | tr -d '[]' | tr -s ' '"),
           sep = " ",
           skip = 1,
           head = F) |>
    Map(f=\(x) x[x != ""])
    
##> $V1
##> [1] "Z" "N"
##> 
##> $V2
##> [1] "M" "C" "D"
##> 
##> $V3
##> [1] "P"

Upvotes: 1

Robert Hacken
Robert Hacken

Reputation: 4725

You can read your input from a text file with readLines or from a string variable:

L <- readLines('crates.txt')

# OR

L <- '    [D]    
[N] [C]    
[Z] [M] [P]
 1   2   3 '
L <- strsplit(L, '\n')[[1]] 

and then do e.g. this:

# split into single characters and create a matrix
L <- sapply(strsplit(L, ''), identity)
# keep only rows with a number in the last column (and remove that column)
L <- L[grepl('[0-9]', L[, ncol(L)]), -ncol(L)]
# drop empty cells and reverse rows
L <- apply(L, 1, \(x) rev(x[x!=' ']), simplify=F)

L
# [[1]]
# [1] "Z" "N"
# 
# [[2]]
# [1] "M" "C" "D"
# 
# [[3]]
# [1] "P"

Upvotes: 1

Related Questions