Reputation: 3495
I have, in a database, records that are serialized PHP strings that I must obfuscate emails if there are any. The simplest record is like {s:20:"[email protected]"}
. It is basically saying: this is a string of length 20 which is [email protected]
. This field can be kilobytes long with lot of emails (or none) and sometimes it is empty.
I wish I could use a SQL regular expression function to obfuscate the user part of the email while preserving the length of the string in order not to break the PHP serialization. The example email above shall be turned into {s:20:"[email protected]"}
where the number of x
matches the length of pika.chu
.
Any thoughts?
Here is a more complete example of what can be found as serialized PHP:
a:4:{s:7:"locales";a:3:{i:0;s:5:"fr_FR";i:1;s:5:"de_DE";i:2;s:5:"en_US";}s:9:"publisher";s:18:"[email protected]";s:7:"authors";a:2:{i:0;s:21:"[email protected]";i:1;s:19:"[email protected]";}s:12:"published_at";O:8:"DateTime":3:{s:4:"date";s:26:"2022-01-26 13:05:26.531289";s:13:"timezone_type";i:3;s:8:"timezone";s:3:"UTC";}}
Upvotes: 1
Views: 109
Reputation: 1955
I tried to do it using native functions but it not worked because functions like REGEXP_REPLACE
don't let you manipulate the match to get the size of it, for example.
Instead, I've created a UDF to do that:
CREATE TEMP FUNCTION hideEmail(str STRING)
RETURNS STRING
LANGUAGE js AS """
return str
.replace(/([a-zA-Z.0-9_\\+-:]*)@/g, function(txt){return '*'.repeat(txt.length-1)+"@";})
""";
select hideEmail('a:4:{s:7:"locales";a:3:{i:0;s:5:"fr_FR";i:1;s:5:"de_DE";i:2;s:5:"en_US";}s:9:"publisher";s:18:"[email protected]";s:7:"authors";a:2:{i:0;s:21:"[email protected]";i:1;s:19:"[email protected]";}s:12:"published_at";O:8:"DateTime":3:{s:4:"date";s:26:"2022-01-26 13:05:26.531289";s:13:"timezone_type";i:3;s:8:"timezone";s:3:"UTC";}}')
Result:
a:4:{s:7:"locales";a:3:{i:0;s:5:"fr_FR";i:1;s:5:"de_DE";i:2;s:5:"en_US";}s:9:"publisher";s:18:"****@something.com";s:7:"authors";a:2:{i:0;s:21:"*******@something.com";i:1;s:19:"******@software.org";}s:12:"published_at";O:8:"DateTime":3:{s:4:"date";s:26:"2022-01-26 13:05:26.531289";s:13:"timezone_type";i:3;s:8:"timezone";s:3:"UTC";}}
Upvotes: 2