Reputation: 513
I am working on a project that calls for parsing out some financial files into a database. The files have no delimiters, each record is essentially a huge string and they give me the map of what positions are what field.
Here is one record:
GCA0000000188T0544291021BW1079 021BW1079 JOE88TSUSD2015041420150414000000001053060070+000000001053060070+000000001053060070+000000000000000000+000000000000000000+000000000000000000+000000000000000000+000000000000000000+000000000000000000+000000000000000000+000000000000000000+000000000000000000+000000000000000000+000000000000000000+000000000000000000+000000000001053060+000000000001053060+000000000000263265+000000000000000000+021BW107904 042 000000000000000000+00000000000000000000000000000000000000000000PEPPERS AN034 04100003ALTERNATIVE FOCUS BLUETREND L.P. HOLDBACK NUSD000000000010530600+0000000000+000000010000000000+000000000010530600+USD
And the mapping goes something like this:
position 000-002 -> Field X
position 003-012 -> Field Y
position 013-023 -> Field Z
and so on.
Question here: whats the best way to do this? I am on a php stack. I was thinking about creating methods for each field, passing in the whole string, and returning the values. Is there a package that I can use to help?
Upvotes: 0
Views: 188
Reputation: 26375
The built-in function sscanf()
works well for fixed-width fields like this - given the proper format it'll parse it out into an array. The format is described in the documentation for sprintf()
(which is its counterpart) with only a few differences.
Here it's pretty simple:
%3s
- 3 byte long string%10s
- 10 byte long string%11s
- 11 byte long string$fields = sscanf($record, '%3s%10s%11s');
var_dump($fields);
array(3) {
[0]=>
string(3) "GCA"
[1]=>
string(10) "0000000188"
[2]=>
string(11) "T0544291021"
}
If instead you'd prefer for example to:
You could use:
%3s
- 3 byte long string%10d
- 10 character wide signed decimalT
- literal "T" (not returned)%10d
- 10 character wide signed decimal$fields = sscanf($record, '%3s%10dT%10d');
var_dump($fields);
array(3) {
[0]=>
string(3) "GCA"
[1]=>
int(188)
[2]=>
int(544291021)
}
An alternative would be to use unpack()
, whose format information you can find in the documentation for its counterpart pack()
A3Field X
- Unpack 3 bytes as a string into "Field X"A10Field Y
- Unpack 10 bytes as a string into "Field Y"A11Field Z
- Unpack 11 bytes as a string into "Field Z"$fields = unpack('A3Field X/A10Field Y/A11Field Z', $record);
var_dump($fields);
array(3) {
["Field X"]=>
string(3) "GCA"
["Field Y"]=>
string(10) "0000000188"
["Field Z"]=>
string(11) "T0544291021"
}
Whether to use sscanf
or unpack
depends on the nature of your data and how exactly you need to treat it: sscanf
does well interpreting characters in the string while unpack
does well interpreting the bytes that make it up (it's more useful for binary data.)
For example given the string "1"
with unpack you can get the value of the byte it consists of (i.e. its ASCII value):
$fields = unpack('c', "1");
var_dump($fields[1]); //int(49)
unpack
is powerful but complex, you can find more information on it at http://perldoc.perl.org/perlpacktut.html
Upvotes: 2
Reputation: 7886
The easiest solution is probably to just use substr if you have what columns to use.
$string = /* the really long string */
$field_x = substr($string, 0, 3);
$field_y = substr($string, 3, 9);
/* etc */
Upvotes: 0