RoyalTS
RoyalTS

Reputation: 10203

indented unordered list to nested list()

I've got a log file that looks as follows:

Data:
 +datadir=/data/2017-11-22
 +Nusers=5292
Parameters:
 +outdir=/data/2017-11-22/out
 +K=20
 +IC=179
 +ICgroups=3
   -group 1: 1-1
    ICeffects: 1-5
   -group 2: 2-173
    ICeffects: 6-10
   -group 3: 175-179
    ICeffects: 11-15

I would like to parse this logfile into a nested list using R so that the result will look like this:

result <- list(Data = list(datadir = '/data/2017-11-22',
                           Nusers = 5292),
               Parameters = list(outdir = '/data/2017-11-22/out',
                                 K = 20,
                                 IC = 179,
                                 ICgroups = list(list('group 1' = '1-1',
                                                      ICeffects = '1-5'),
                                                      list('group 2' = '2-173',
                                                      ICeffects = '6-10'),
                                                      list('group 1' = '175-179',
                                                      ICeffects = '11-15'))))

Is there a not-extremely-painful way of doing this?

Upvotes: 1

Views: 112

Answers (1)

Maurits Evers
Maurits Evers

Reputation: 50718

Disclaimer: This is messy. There is no guarantee that this will work for larger/different files without some tweaking. You will need to do some careful checking.

The key idea here is to reformat the raw data, to make it consistent with the YAML format, and then use yaml::yaml.load to parse the data to produce a nested list.

By the way, this is an excellent example on why one really should use a common markup language for log-output/config files (like JSON, YAML, etc.)...

I assume you read in the log file using readLines to produce the vector of strings ss.

# Sample data
ss <- c(
    "Data:",
    " +datadir=/data/2017-11-22",
    " +Nusers=5292",
    "Parameters:",
    " +outdir=/data/2017-11-22/out",
    " +K=20",
    " +IC=179",
    " +ICgroups=3",
    "   -group 1: 1-1",
    "    ICeffects: 1-5",
    "   -group 2: 2-173",
    "    ICeffects: 6-10",
    "   -group 3: 175-179",
    "    ICeffects: 11-15")

We then reformat the data to adhere to the YAML format.

# Reformat to adhere to YAML formatting
ss <- gsub("\\+", "- ", ss);                   # Replace "+" with "- "
ss <- gsub("ICgroups=\\d+","ICgroups:", ss);   # Replace "ICgroups=3" with "ICgroups:"
ss <- gsub("=", " : ", ss);                    # Replace "=" with ": "
ss <- gsub("-group", "- group", ss);           # Replace "-group" with "- group"
ss <- gsub("ICeffects", " ICeffects", ss);     # Replace "ICeffects" with " ICeffects"

Note that – consistent with your expected output – the value 3 from ICgroups doesn't get used, and we need to replace ICgroups=3 with ICgroups: to initiate a nested sub-list. This was the part that threw me off first...

Loading & parsing the YAML string then produces a nested list.

require(yaml);
lst <- yaml.load(paste(ss, collapse = "\n"));
lst;

#$Data
#$Data[[1]]
#$Data[[1]]$datadir
#[1] "/data/2017-11-22"
#
#
#$Data[[2]]
#$Data[[2]]$Nusers
#[1] 5292
#
#
#
#$Parameters
#$Parameters[[1]]
#$Parameters[[1]]$outdir
#[1] "/data/2017-11-22/out"
#
#
#$Parameters[[2]]
#$Parameters[[2]]$K
#[1] 20
#
#
#$Parameters[[3]]
#$Parameters[[3]]$IC
#[1] 179
#
#
#$Parameters[[4]]
#$Parameters[[4]]$ICgroups
#$Parameters[[4]]$ICgroups[[1]]
#$Parameters[[4]]$ICgroups[[1]]$`group 1`
#[1] "1-1"
#
#$Parameters[[4]]$ICgroups[[1]]$ICeffects
#[1] "1-5"
#
#
#$Parameters[[4]]$ICgroups[[2]]
#$Parameters[[4]]$ICgroups[[2]]$`group 2`
#[1] "2-173"
#
#$Parameters[[4]]$ICgroups[[2]]$ICeffects
#[1] "6-10"
#
#
#$Parameters[[4]]$ICgroups[[3]]
#$Parameters[[4]]$ICgroups[[3]]$`group 3`
#[1] "175-179"
#
#$Parameters[[4]]$ICgroups[[3]]$ICeffects
#[1] "11-15"

PS. You will need to test this on larger files, and make changes to the substitution if necessary.

Upvotes: 2

Related Questions