Reputation: 27380
I have the following JSON.stringify(text) that I extracted from a voucher (text is the variable name) in javascript:
" \nVehicle Details \nPassenger Details \nEconomy Car \nMaximum Passengers 4 \nSuitcases capacity 4 \nFirst Name \nLeif \nEmail \[email protected] \nLast Name \nLast Name \nBeermer \nBeermer \nMobile Phone Number \n46712 125 313 \n46712 125 123 \nPassengers \nAdults 1 \nChildren 0 \nInfants 0 \nAdditional Options \nno_extras_in_voucher \nPayment \nPayment Method Credit Card \nAmount Paid 60 € \nAmount pending 0 € \nArrival \nDrop off Location Divani Palace Acropolis \nFlight Arrival Time 12:55 AM \nAirline SsS \nFlight Number SK717 \nOriginating Airport (Where your flight is from?) Copenhagen \nPickup Location Athens Airport \nReturn \nReturn \nDrop-Off Location Athens Airport \nDrop-Off Location Athens Airport \nFlight Departure Time 13:45 \nFlight Departure Time 13:45 \nAirline SAS \nAirline SAS \nFlight Number SK778 \nFlight Number SK778 \nPick Up Time From Your Accommodation 11:00 AM \nPick Up Time From Your Accommodation 11:00 AM \nPick Up Time From Your Accommodation 11:00 AM \nPick Up Location Divani Palace Acropolis \nPick Up Location Divani Palace Acropolis \nBooking Code: 7777 Booking Date: 22/03/2019 09:22 Total Cost: 60 € \nArrival Flight Date & Time 28/03/2019 \nAccommodation Name Divani Palace Acropolis \nAccommodation Address Parthenonos 19, Athina 117 42, Greece \nComments \nFlight Departure Date 29/03/2019 \nAccommodation Name Divani Palace Acropolis \nAccommodation Address Parthenonos 19, Athina 117 42, Greece \nComments
I would to get the words that are in bold. The words that are not bolded are fixed. Namely, every voucher has the same exact format except for the bold words. As you can see there are a lot of duplicate words and also, some of these might be two or even three words (e.g. Economy Car or hotel amsterdam). What I am doing right now is trying to get the text between two strings. For example if I want to get the text Economy Car I would use this regex:
text.match(/Details ([\s\S]*?) Maximum/)
But this returns None and I assume it is because there are many values within the strings or there are duplicate words. I would like to avoid for loops since I am using google scripts and there is a runtime limit.
Upvotes: 0
Views: 142
Reputation: 350725
The text looks like it is the text representation of what originally is HTML. This could mean that some space characters are other white space, like TAB or newline characters. So you'd better use \s+
in your regular expressions. As a side note: if you have access to the HTML, then it is better to rely on the HTML instead of the text representation of it.
You could list the field labels and take the text that occurs between them. Some extra logic is needed to ignore empty values, repeating values, or skip over possibly missing labels without breaking the rest of the process.
Still this process relies heavily on the assumption you stated:
The words that are not bolded are fixed. Namely, every voucher has the same exact format
This code produces field/value pairs. Because the fields (as they occur in the input) are not unique, the results are put in array, not in an object keyed by field labels:
// Input data
var text = " \nVehicle Details \nPassenger Details \nEconomy Car \nMaximum Passengers 4 \nSuitcases capacity 4 \nFirst Name \nTerf \nEmail \[email protected] \nLast Name \nLast Name \nNick \nNick \nMobile Phone Number \n43702 136 845 \n43702 136 845 \nPassengers \nAdults 2 \nChildren 0 \nInfants 0 \nAdditional Options \nno_extras_in_voucher \nPayment \nPayment Method Credit Card \nAmount Paid 60 € \nAmount pending 0 € \nArrival \nDrop off Location Hotel Acropolis \nFlight Arrival Time 12:55 AM \nAirline SKG \nFlight Number SK732 \nOriginating Airport (Where your flight is from?) Amsterdam \nPickup Location Athens Airport \nReturn \nReturn \nDrop-Off Location Athens Airport \nDrop-Off Location Athens Airport \nFlight Departure Time 13:45 \nFlight Departure Time 13:45 \nAirline SKG \nAirline SKG \nFlight Number SK732 \nFlight Number SK732 \nPick Up Time From Your Accommodation 11:00 AM \nPick Up Time From Your Accommodation 11:00 AM \nPick Up Time From Your Accommodation 11:00 AM \nPick Up Location Hotel Acropolis \nPick Up Location Hotel Acropolis \nBooking Code: 744 Booking Date: 22/03/2019 09:22 Total Cost: 60 € \nArrival Flight Date & Time 28/03/2019 \nAccommodation Name Hotel Acropolis \nAccommodation Address Parth 11, Athina 117 42, Greece \nComments \nFlight Departure Date 29/03/2019 \nAccommodation Name Hotel Acropolis \nAccommodation Address Parthen 19, Athina 117 42, Greece \nComments "
var fields = [
"Vehicle Details", "Passenger Details", "Maximum Passengers",
"Suitcases capacity", "First Name", "Email", "Last Name",
"Last Name", "Mobile Phone Number", "Passengers", "Adults",
"Children", "Infants", "Additional Options", "Payment",
"Payment Method", "Amount Paid", "Amount pending", "Arrival",
"Drop off Location", "Flight Arrival Time", "Airline",
"Flight Number", "Originating Airport (Where your flight is from?)",
"Pickup Location", "Return", "Return", "Drop-Off Location",
"Drop-Off Location", "Flight Departure Time",
"Flight Departure Time", "Airline", "Airline", "Flight Number",
"Flight Number", "Pick Up Time From Your Accommodation",
"Pick Up Time From Your Accommodation",
"Pick Up Time From Your Accommodation",
"Pick Up Location", "Pick Up Location", "Booking Code:",
"Booking Date:", "Total Cost:", "Arrival Flight Date & Time",
"Accommodation Name", "Accommodation Address",
"Comments", "Flight Departure Date", "Accommodation Name",
"Accommodation Address", "Comments"
];
var result = fields.reduceRight(function (acc, field, j) {
var i = acc[0].lastIndexOf(field);
var value = acc[0].slice(i+field.length).trim().split("\n")[0].trim();
return [acc[0].slice(0, i),
i<0 || !value || field==fields[j+1]
? acc[1]
: [{ field: field, value: value }].concat(acc[1])];
}, [text, []]).pop();
console.log(result);
The output structure is an array of objects where each object has a field and value property. This means you need to iterate the array to find a certain field. It would be nicer if the output were a plain object where you can access the values by their key. The problem is that the fields are not unique (like "Flight Number").
Here is an alternative solution where such fields will get an array of values:
// Input data
var text = " \nVehicle Details \nPassenger Details \nEconomy Car \nMaximum Passengers 4 \nSuitcases capacity 4 \nFirst Name \nTerf \nEmail \[email protected] \nLast Name \nLast Name \nNick \nNick \nMobile Phone Number \n43702 136 845 \n43702 136 845 \nPassengers \nAdults 2 \nChildren 0 \nInfants 0 \nAdditional Options \nno_extras_in_voucher \nPayment \nPayment Method Credit Card \nAmount Paid 60 € \nAmount pending 0 € \nArrival \nDrop off Location Hotel Acropolis \nFlight Arrival Time 12:55 AM \nAirline SKG \nFlight Number SK732 \nOriginating Airport (Where your flight is from?) Amsterdam \nPickup Location Athens Airport \nReturn \nReturn \nDrop-Off Location Athens Airport \nDrop-Off Location Athens Airport \nFlight Departure Time 13:45 \nFlight Departure Time 13:45 \nAirline SKG \nAirline SKG \nFlight Number SK732 \nFlight Number SK732 \nPick Up Time From Your Accommodation 11:00 AM \nPick Up Time From Your Accommodation 11:00 AM \nPick Up Time From Your Accommodation 11:00 AM \nPick Up Location Hotel Acropolis \nPick Up Location Hotel Acropolis \nBooking Code: 744 Booking Date: 22/03/2019 09:22 Total Cost: 60 € \nArrival Flight Date & Time 28/03/2019 \nAccommodation Name Hotel Acropolis \nAccommodation Address Parth 11, Athina 117 42, Greece \nComments \nFlight Departure Date 29/03/2019 \nAccommodation Name Hotel Acropolis \nAccommodation Address Parthen 19, Athina 117 42, Greece \nComments "
var fields = [
"Vehicle Details", "Passenger Details", "Maximum Passengers",
"Suitcases capacity", "First Name", "Email", "Last Name",
"Last Name", "Mobile Phone Number", "Passengers", "Adults",
"Children", "Infants", "Additional Options", "Payment",
"Payment Method", "Amount Paid", "Amount pending", "Arrival",
"Drop off Location", "Flight Arrival Time", "Airline",
"Flight Number", "Originating Airport (Where your flight is from?)",
"Pickup Location", "Return", "Return", "Drop-Off Location",
"Drop-Off Location", "Flight Departure Time",
"Flight Departure Time", "Airline", "Airline", "Flight Number",
"Flight Number", "Pick Up Time From Your Accommodation",
"Pick Up Time From Your Accommodation",
"Pick Up Time From Your Accommodation",
"Pick Up Location", "Pick Up Location", "Booking Code:",
"Booking Date:", "Total Cost:", "Arrival Flight Date & Time",
"Accommodation Name", "Accommodation Address",
"Comments", "Flight Departure Date", "Accommodation Name",
"Accommodation Address", "Comments"
];
var result = fields.reduceRight(function (acc, field, j) {
var i = acc[0].lastIndexOf(field);
var value = acc[0].slice(i+field.length).trim().split("\n")[0].trim();
var text = acc[0].slice(0, i);
if (i<0 || !value || field==fields[j+1]) return [text, acc[1]];
acc[1][field] = field in acc[1] ? [].concat(acc[1][field], value) : value;
return [text, acc[1]];
}, [text, {}]).pop();
console.log(result);
Now you can get for instance the "Flight Departure Date" as follows:
console.log(result["Flight Departure Date"]);
Upvotes: 0
Reputation: 370
Update: updated the code to work with Apps Script Assuming that you need a script to parse multiple similar strings. Assuming that only the text in bold changes.
Basic algorithm would be to start from the end and parse field by fireld. You would need an array of field names:
var fields = [
"Vehicle Details Passenger Details",
"Maximum Passengers",
//...
"Airline",
"Airline SEK Flight Number"
]
Then do a loop, assuming your string is in str
value:
var values = [];
for(var i = fields.length - 1; i > -1; i--){
var indexOfField = str.lastIndexOf(fields[i]);
var fieldLength = fields[i].length;
var value = str.substr(indexOfField + fieldLength);
values.push(value);
str = str.substr(0, indexOfField);
}
Logger.log(values)
Upvotes: 1