roryhewitt
roryhewitt

Reputation: 4507

Testing bitmask when stored as integer and available as string

I have a bitmask (really a 'flagmask') of integer values (1, 2, 4, 8, 16 etc.) which apply to a field and I need to store this in a (text) log file. What I effectively store is something like "x=296" which indicates that for field "x", flags 256, 32 and 8 were set.

When searching the logs, how can I easily search this text string ("x=nnn") and determine from the value of "nnn" whether a specific flag was set? For instance, how could I look at the number and know that flag 8 was set?

I know this is a somewhat trivial question if we're doing 'true' bitmask processing, but I've not seen it asked this way before - the log searching will just be doing string matching, so it just sees a value of "296" and there is no way to convert it to its constituent flags - we're just using basic string searching with maybe some easy SQL in there.

Upvotes: 4

Views: 133

Answers (3)

user21103923
user21103923

Reputation:

use a bitwise operator & to check if a specific flag is set in the value

function hasFlag(value, flag) {
  return (value & flag) === flag;
}

const value = 296;
console.log(hasFlag(value, 256)); // true
console.log(hasFlag(value, 32)); // true
console.log(hasFlag(value, 8)); // true
console.log(hasFlag(value, 4)); // false

Upvotes: -1

LSerni
LSerni

Reputation: 57368

Much depends on how "easy" the "easy SQL" must be.

If you can use string splitting to separate "X" from "296", then using AND with the value 296 is easy.

If you cannot, then, as you observed, 296 yields no traces of its 8th bit state. You'd need to check for all the values that have the 8th bit set, and of course that means exactly half of the available values. You can compress the regexp somewhat:

248 249 250 251 252 253 254 255 => 24[89]|25[0-5] 264 265 266 267 268 269 270 271 => 26[4-9]|27[01] 280 281 282 283 284 285 286 287 => 28[0-7] 296 297 298 299 300 301 302 303 => 29[6-9]|30[1-3]

...24[89]|25[0-5]|26[4-9]|27[01]|28[0-7]|29[6-9]|30[1-3]...

but I think that the evaluation tree won't change very much, this kind of optimization is already present in most regex engines. Performances are going to be awful.

If at all possible I would try to rework the logs (maybe with a filter) to rewrite them as "X=000100101000b" or at least "X=0128h". The latter hexadecimal would allow you to search for bit 8 looking for "X=...[89A-F]h" only.

I have several times had to do changes like these - it involves preparing a pipe filter to create the new logs (it is more complicated if the logging process writes straight to a file - at worst, you might be forced to never run searches on the current log file, or to rotate the logs when such a search is needed), then slowly during low-load periods the old logs are retrieved, decompressed, reparsed, recompressed, and stored back. It's long and boring, but doable.

For database-stored logs, it's much easier and you can even avoid any alterations to the process that does the logs - you add a flag to the rows to be converted, or persist the timestamps of the catch-up conversion:

SELECT ... FROM log WHERE creation > lowWaterMark ORDER BY creation DESC; SELECT ... FROM log WHERE creation < highWaterMark ORDER BY creation ASC;

the retrieved rows are updated, and the appropriate watermarkers are updated with the last creation value that has been retrieved. The "lowWaterMark" pursues the logging process. This way you only can't search from the current instant down to lowWaterMark, which will "lag" behind (ultimately, just a few seconds, except under heavy load).

Upvotes: 0

James Risner
James Risner

Reputation: 6076

Your best bet is to do full string regex matching.

For bit 8 set, do this:

SELECT log_line
FROM log_table
WHERE log_line
RLIKE 'x=(128|129|130|131|132|133|134|135|136|137|138|139|140|141|142|143|144|145|146|147|148|149|150|151|152|153|154|155|156|157|158|159|160|161|162|163|164|165|166|167|168|169|170|171|172|173|174|175|176|177|178|179|180|181|182|183|184|185|186|187|188|189|190|191|192|193|194|195|196|197|198|199|200|201|202|203|204|205|206|207|208|209|210|211|212|213|214|215|216|217|218|219|220|221|222|223|224|225|226|227|228|229|230|231|232|233|234|235|236|237|238|239|240|241|242|243|244|245|246|247|248|249|250|251|252|253|254|255)[^0-9]'

Or simplified to:

RLIKE 'x=(12[89]|1[3-9][0-9]|2[0-4][0-9]|25[0-5])[^0-9]'

The string x= must exist in the logs followed by the decimal number and a non decimal after the number.

For bit 2 set, do this:

SELECT log_line FROM log_table WHERE log_line RLIKE 'x=(2|3|6|7|10|11|14|15|18|19|22|23|26|27|30|31|34|35|38|39|42|43|46|47|50|51|54|55|58|59|62|63|66|67|70|71|74|75|78|79|82|83|86|87|90|91|94|95|98|99|102|103|106|107|110|111|114|115|118|119|122|123|126|127|130|131|134|135|138|139|142|143|146|147|150|151|154|155|158|159|162|163|166|167|170|171|174|175|178|179|182|183|186|187|190|191|194|195|198|199|202|203|206|207|210|211|214|215|218|219|222|223|226|227|230|231|234|235|238|239|242|243|246|247|250|251|254|255)[^0-9]'

I test the bit2 on some live data:

SELECT count(log_line)
...
count(log_line)
128

1 row in set (0.002 sec)

Upvotes: 3

Related Questions