Reputation: 13903
I'm asking how a programmer would approach this task because I'm not really a programmer. I'm a grad student studying quantitative social science, and while I've been programming consistently for a year now I have no formal training.
I'm not concerned about implementing a generic algorithm. I'm happy to work in Bash, AWK, R, or Python. I've also written small snippets of code (beyond "hello world," but not much further) in Java, C, JavaScript, and Matlab. However, if there's some language or some feature of a language that would make this task easier or more natural, I'd love to know about it.
Instead, I'm interested in algorithms and data structures. What do I grab, when do I grab it, where do I save it, etc.? I imagine I could probably do all this with a few cleverly-constructed regular expressions, and I am pretty comfortable with intermediate-level regex features like lookarounds, but anything I make up on my own will undoubtedly be hacky and ad-hoc.
What I have is code (it happens to be in R) that looks something like this, where #
's indicate comments:
items = list(
day1 = list(
# a apples
# b oranges
# c pears
# d red grapes
# m.
# 1 peanuts
# 2 cashews
type1 = c("a", "b", "d", "m.2") # this returns a vector of strings
type2 = c("c", "m.1")
), # this returns a list of vectors
day2 = list(
# a apples
# b oranges
# c pears
# d red grapes
# e plums
# m.
# 1 peanuts
# 2 cashews
# 3 pistachios
type1 = c("a", "b", "d", "e", "m.2")
type2 = c("c", "m.1", "m.3")
)
) # this returns a list of lists of vectors
and what I would like instead is code that looks like this:
items = list(
day1 = list(
type1 = c(
"apples" = "a",
"oranges" = "b",
"red grapes" = "d",
"cashews" = "m.2"
),
type2 = c(
"pears" = "c",
"peanuts" = "m.1"
)
),
day2 = list(
type1 = c(
"apples" = "a",
"oranges" = "b",
"red grapes" = "d",
"plums" = "e",
"cashews" = "m.2"
),
type2 = c(
"pears" = "c",
"peanuts" = "m.1",
"pistachios" = "m.3"
)
)
)
Some things to note:
day1
being "nested" inside the naming for day2
. Some of the letters might swap around.type
s within day
s.So, how would a programmer approach the task of programmatically turning the first code snippet into the second? I can have it done in about 15 minutes of copying and pasting, but I'd like to learn something here. And again, I'm not asking for pre-written code, I'm just looking for some direction since right now I'm just groping in the dark.
Upvotes: 0
Views: 98
Reputation: 46394
Given your sample of code it should be doable by putting together a transformation that consists of a couple steps. At a high level you'd need to read the comments into a data collection that you can query, then parse the code and do a find/replace referencing the data collection.
Without getting in too deep this might look like:
^\s*#.*$
) would give you a result like:# a apples # b oranges # c pears # d red grapes # m. # 1 peanuts # 2 cashews # a apples # b oranges # c pears # d red grapes # e plums # m. # 1 peanuts # 2 cashews # 3 pistachios
m.
cases requires some assumptions. Based on your sample I'd start with some pseudocode like:For each line Get the first character after the # and call it "key" Find the word after the letter and call it "value" If the key is a letter Add "key" => "value" to the dictionary Next line If the key is a number Get the last key added to the dictionary and call it as "parentkey" Add "parentkey"+"key" => "value" to the dictionary Next line
This would give you a structure like this:
{
"a": "apples",
"b": "oranges",
"c": "pears",
"d": "red grapes",
"m.": "",
"m.1": "peanuts",
"m.2": "cashews",
"a": "apples",
"b": "oranges",
"c": "pears",
"d": "red grapes",
"e": "plums",
"m.": "",
"m.1": "peanuts",
"m.2": "cashews",
"m.3": "pistachios"
}
You can clean out the empty "m." entries by iterating over it and removing items with an empty value.
For each dictionary entry (key, value) Find strings like "key" and replace with strings like "value" = "key"
All in all it's not terribly efficient or elegant, but it's not difficult to code and should work. Granted there's probably additional details to consider (there always are) but this is a fairly simple approach given your samples.
Upvotes: 1
Reputation: 46497
I would use a quick regular expression substitution to reduce the work to do and then fix it up by hand. For example you get over halfway there with:
s/# (\w+) ([\w ]+)/"\2" = "\1"/
The exact regular expression to write and how you use it depends on your tool. Different editors and programming languages are very different. Google for what you are using to learn more. (You may have multiple easy options - a Python command line would use one syntax, and the vi editor a different one.)
If you have to do this task regularly or for more code, then you would need to learn about parsing. Which is a lot more work (too much to be worthwhile for something like this if you don't have code sitting around to do it) but also a lot more powerful in the long run.
Upvotes: 0