NoSoup4you
NoSoup4you

Reputation: 668

How to parse out data of a string

I have a function which gets a string from another website and if I extract it I end up with the following string

IFX TMP2134567 1433010010 WT33 PARTIAL 2014-11-26 09:43:58 IFX TEMP12345 1433010003 SW80 PARTIAL 2014-11-26 09:43:10 IFX AP RETERM 007 1418310108 MB01 CONFIRMED 2014-07-03 09:48:37

In this case it's 2 records which have 6 fields each and they are all separated by a space. how can I go and read the string and add these into an structure and array to access them.

The fields would be set up like this

  1. IFX
  2. TMP2134567 (this field may contain a space)
  3. 1433010010
  4. WT33
  5. PARTIAL
  6. 2014-11-26 09:43:58.

So if we use the " " as a separator we would get 7 since the 6th is a date time and has a space between I could also use 7 since I can put 6 and 7 back together and store date and time separately.

My question is there a way to do this with 6 or if I have to use 7 how would I do that. I tried valuelist but that does not work.

I know a couple of things in my list, 1st one is always 3 Char, 4th is always 4 char and my record ends with a date time in format YYYY-MM-DD HH:MM:SS

To make it a bit more complicated I just found that the 2nd field can have spaces like in the 3rd record which looks like this "AP RETERM 007"

Upvotes: 1

Views: 107

Answers (2)

Regular Jo
Regular Jo

Reputation: 5510

Another option is to create a JSON string with your data like this, and then deserialize it.

<cfsavecontent variable="sampledata">
  IFX TMP2134567 1433010010 WT33 PARTIAL 2014-11-26 09:43:58 IFX TEM P12345 1433010003 SW80 PARTIAL 2014-11-26 09:43:10 IFX AP RETERM 007 1418310108 MB01 CONFIRMED 2014-07-03 09:48:37</cfsavecontent>

<cfset asJson  = ReReplaceNoCase(sampledata,"\s*(.{3}) (.*?) (\d+) (.{4}) ([^\s]*) (\d+-\d+-\d+ \d+:\d+:\d+)\s*",'["\1","\2","\3","\4","\5","\6"],',"ALL")>

<!--- Replace the last comma in the generated string with a closing bracket --->
<cfset asJson = "[" & ReReplace(asJson,",$","]","ALL")>

<cfset result_array = DeSerializeJSON(asJson)>

<cfdump var="#result_array#">

You can access the data simply with the resulting array.

So here's how I understand it

  1. 3 characters
  2. Variable string
  3. All digits
  4. 4 characters
  5. I assume this value never contains a space
  6. Date/Time

Upvotes: 1

Adam Cameron
Adam Cameron

Reputation: 29870

Based on assuming a "yes" to my question above, this solution works:

<cfscript>
raw = " IFX TMP2134567 1433010010 WT33 PARTIAL 2014-11-26 09:43:58 IFX TEMP12345 1433010003 SW80 PARTIAL 2014-11-26 09:43:10 IFX AP RETERM 007 1418310108 MB01 CONFIRMED 2014-07-03 09:48:37";
recordPattern = "(\S+)\s+([\w\s]+)\s+(\d+)\s+(\S+)\s+(\S+)\s+(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})";
keys = ["a","b","c","d","e","f"];

records = getRecordsFromString(raw, recordPattern, keys);
writeDump(records);

function getRecordsFromString(raw, pattern, keys){
    var offset = 1;
    var records = [];
    while (true) {
        var result = getRecord(raw, recordPattern, keys, offset);
        offset = result.offset;
        if (!offset) break;
        arrayAppend(records, result.record);
    }
    return records;
}

function getRecord(raw, recordPattern, keys, offset){
    var match = reFind(recordPattern, raw, offset, true);
    if (arrayLen(match.pos) != arrayLen(keys)+1){
        return {record="", offset=0};
    }
    var keyIdx=1;
    for (var key in keys){
        record[key] = mid(raw, match.pos[++keyIdx], match.len[keyIdx]);
    }
    return {record=record, offset=offset+match.len[1]};
}
</cfscript>

Obviously you will need to tweak the recordPattern and keys to suit your actual needs.

And if you don't understand the regular expression usage there, do yourself a favour and read up on it. I do a series on "regular expressions in CFML" on my blog, which would be an adequate starting point.

Upvotes: 1

Related Questions