Reputation: 3940
After uploading a file I put each line of the file into an array. I want to only save a part of the array, here is an example of the array...
[ "1, \"Hlavní\"\n", "2, OK\n", "3618, \"Duplicitní záznamy\"\n", "3619, \"Anyth1ng_ Go@es /n th7s'me??\"\n" ]
I want to trim the strings down to...
[ "Hlavní", "OK", "Duplicitní záznamy", "Anyth1ng_ Go@es /n th7s'me??" ]
The one thing I can be certain of is its always between a \"
and \"
or ,
and \n
, I have tried grabbing the text but don't know how to get it so precise.
Here is one file uploaded:
#INDEX STRING
0, "Deutsch"
# Main
1, "Hauptmenü"
2, "Sonstiges"
3, "Kontrolle"
4, "Datei Ansicht"
5, "Laden..."
6, "Registriert"
7, "Nicht registriert"
8, "Ja"
9, "Nein"
10, "Anrufen"
11, "Suchen"
12, "Neu"
13, "Bearbeiten"
14, "Löschen"
15, "Alle löschen"
16, "Zurück"
17, "Zurück zum Hauptverzeichnis"
18, "Optionen"
19, "Speichern"
And another
#Comment 1-500
1, Ende
2, OK
3, Abbrechen
4, Senden
5, Ja
6, Nein
7, Ein
8, Aus
9, Start
10, Stopp
11, Pause
12, Standard
13, Alle
14, Titel
15, Benutzerdefinierte Sprache
#Call 501-999
501, Telefon
503, Wählen...
504,
Upvotes: 0
Views: 72
Reputation: 2359
I think the selected answer is not the best, here a better one.
@upload = File.new(@request.attachment.path)
@messages = File.read(@upload).scan(/\s+"?([^"\n]*)(?:"|\n)/).flatten
Upvotes: 0
Reputation: 110725
This is one option:
arr = [ "1, \"Hlavní\"\n",
"2, OK\n",
"3618, \"Duplicitní záznamy\"\n",
"3619, \"Anyth1ng_ Go@es /n th7s'me??\"\n" ]
r = /,\s+"\K.+?(?=")|,\s+\K.+?(?=\n)/
arr.map { |s| s[r] }
#=> ["Hlavní", "OK", "Duplicitní záznamy", "Anyth1ng_ Go@es /n th7s'me??"]
I've required the string to be preceded by either , \"
or by ,
. The former is somewhat stronger than the specified match requirement; if inappropriate, it can be weakened in the obvious way. I used \K
(match what comes before but do not include in the match) rather than a positive lookbehind to allow for the possibility of varying amounts of whitespace after the comma.
Let's take a closer look at the regex. By adding the x
("extended") at the end we can string it out over several commented lines:
r = /
,\s+ # match a comma followed by one or more whitespace chars
" # match `"`
\K # forget what has been matched previously
.+? # match any number of any character, lazily
(?=") # match must be immediately followed by `"` (positive lookahead)
| # match what has been matched so far or is matched later ("or")
,\s+ # as above
\K # as above
.+? # as above
(?=\n) # match to be immediately followed by `\n` (positive lookahead)
/x
Let's confirm the regex can be written this way:
arr.map { |s| s[r] }
#=> ["Hlavní", "OK", "Duplicitní záznamy", "Anyth1ng_ Go@es /n th7s'me??"]
Note:
?
following .+
makes the match lazy ("non-greedy") so that it will stop when the following element of the match ("
or \n
) is found, rather than gobbling up everything until it finds the last "
or \n
in the string;This could alternatively be written:
arr.map { |s| s[/,\s+\K(?:"\K.+?(?=")|.+?(?=\n))/] }
Upvotes: 2
Reputation: 168199
[ "1, \"Hlavní\"\n", "2, OK\n", "3618, \"Duplicitní záznamy\"\n", "3619, \"Anyth1ng_ Go@es /n th7s'me??\"\n" ]
.map{|s| s[/(?<=")[^"]*(?=")/]}
# => ["Hlavní", nil, "Duplicitní záznamy", "Anyth1ng_ Go@es /n th7s'me??"]
Note that the second element in the result is nil
as per your request (extracting the element between a \"
and \"
).
Upvotes: 1