Reputation: 3940

Consistently separate values in array

After uploading a file I put each line of the file into an array. I want to only save a part of the array, here is an example of the array...

[ "1, \"Hlavní\"\n", "2, OK\n", "3618, \"Duplicitní záznamy\"\n", "3619, \"Anyth1ng_ Go@es /n th7s'me??\"\n" ]

I want to trim the strings down to...

[ "Hlavní", "OK", "Duplicitní záznamy", "Anyth1ng_ Go@es /n th7s'me??" ]

The one thing I can be certain of is its always between a \" and \" or , and \n, I have tried grabbing the text but don't know how to get it so precise.

Here is one file uploaded:

#INDEX  STRING
0, "Deutsch"
# Main
1, "Hauptmenü"
2, "Sonstiges"
3, "Kontrolle"
4, "Datei Ansicht"
5, "Laden..."
6, "Registriert"
7, "Nicht registriert"
8, "Ja"
9, "Nein"
10, "Anrufen"
11, "Suchen"
12, "Neu"
13, "Bearbeiten"
14, "Löschen"
15, "Alle löschen"
16, "Zurück"
17, "Zurück zum Hauptverzeichnis"
18, "Optionen"
19, "Speichern"

And another

#Comment 1-500
1, Ende
2, OK
3, Abbrechen
4, Senden
5, Ja
6, Nein
7, Ein
8, Aus
9, Start
10, Stopp
11, Pause
12, Standard
13, Alle
14, Titel
15, Benutzerdefinierte Sprache

#Call 501-999
501, Telefon
503, Wählen...
504,

Upvotes: 0

Answers (3)

Nafaa Boutefer

Reputation: 2359

I think the selected answer is not the best, here a better one.

@upload = File.new(@request.attachment.path)
@messages = File.read(@upload).scan(/\s+"?([^"\n]*)(?:"|\n)/).flatten

Upvotes: 0

Cary Swoveland

Reputation: 110725

This is one option:

arr = [ "1, \"Hlavní\"\n",
        "2, OK\n",
        "3618, \"Duplicitní záznamy\"\n",
        "3619, \"Anyth1ng_ Go@es /n th7s'me??\"\n" ]

r = /,\s+"\K.+?(?=")|,\s+\K.+?(?=\n)/    
arr.map { |s| s[r] }
  #=> ["Hlavní", "OK", "Duplicitní záznamy", "Anyth1ng_ Go@es /n th7s'me??"]

I've required the string to be preceded by either , \" or by ,. The former is somewhat stronger than the specified match requirement; if inappropriate, it can be weakened in the obvious way. I used \K (match what comes before but do not include in the match) rather than a positive lookbehind to allow for the possibility of varying amounts of whitespace after the comma.

Let's take a closer look at the regex. By adding the x ("extended") at the end we can string it out over several commented lines:

r = /
  ,\s+   # match a comma followed by one or more whitespace chars
  "      # match `"`
  \K     # forget what has been matched previously
  .+?    # match any number of any character, lazily
  (?=")  # match must be immediately followed by `"` (positive lookahead)
  |      # match what has been matched so far or is matched later ("or")
  ,\s+   # as above
  \K     # as above
  .+?    # as above
  (?=\n) # match to be immediately followed by `\n` (positive lookahead)
/x

Let's confirm the regex can be written this way:

arr.map { |s| s[r] }
  #=> ["Hlavní", "OK", "Duplicitní záznamy", "Anyth1ng_ Go@es /n th7s'me??"]

Note:

? following .+ makes the match lazy ("non-greedy") so that it will stop when the following element of the match (" or \n) is found, rather than gobbling up everything until it finds the last " or \n in the string;
the two positive lookaheads are "zero-width", meaning they do not consume characters are not part of the match.

This could alternatively be written:

arr.map { |s| s[/,\s+\K(?:"\K.+?(?=")|.+?(?=\n))/] }

Upvotes: 2

sawa

Reputation: 168199

[ "1, \"Hlavní\"\n", "2, OK\n", "3618, \"Duplicitní záznamy\"\n", "3619, \"Anyth1ng_ Go@es /n th7s'me??\"\n" ]
.map{|s| s[/(?<=")[^"]*(?=")/]}
# => ["Hlavní", nil, "Duplicitní záznamy", "Anyth1ng_ Go@es /n th7s'me??"]

Note that the second element in the result is nil as per your request (extracting the element between a \" and \").

Upvotes: 1

Consistently separate values in array

Answers (3)

Related Questions