imperium2335
imperium2335

Reputation: 24132

rawurlencode for storing data

I have always used rawurlencode to store user entered data into my mysql databases. The main reason I do this is so that stroing foreign characters is very simple I find. I'd then use rawurldecode to retrieve and display the data.

I read somewhere that rawurlencode was not meant for this purpose. Are there any disadvantages to what I'm doing?


So let's say I have a German address with many characters like umlauts etc. What is the simplest way to store this in a mysql database with no risks of it coming out wrong and being searchable using a search script? So far rawurelencode has been excellent for our system. Perhaps the practise can be improved upon by only encoding foreign letters and not common characters like spaces etc, which is a waste of space I totally agree.

Upvotes: 1

Views: 589

Answers (3)

Álvaro González
Álvaro González

Reputation: 146630

Drawbacks I can think of:

  • Waste of disk space.
  • Waste of CPU cycles encoding and decoding on every read and every write.
  • Additional complexity (you can't even inspect data with a MySQL client).
  • Impossibility to use full text searches.
  • URL encoding is not necessarily unique (there're at least two RFCs). It may not lead to data loss but it can lead to duplicate data (e.g., unique indexes where two rows actually contain the same piece of data).
  • You can accidentally encode a non-string piece of data such as a date: 2012-04-20%2013%3A23%3A00

But the main consideration is that such technique is completely arbitrary and unnecessary since MySQL doesn't have the least problem storing the complete Unicode catalogue. You could also decide to swap e's and o's in all strings: Holle, werdl!. Your app would run fine but it would not provide any added value.

Update: As Your Common Sense points out, a SQL clause as basic as ORDER BYis no longer usable. It's not that international chars will be ignored; you'll basically get an arbitrary sort order based on the ASCII code of the % and hexadecimal characters. If you can't SELECT * FROM city ORDER BY city_name reliably, you've rendered your DB useless.

Upvotes: 0

Your Common Sense
Your Common Sense

Reputation: 157980

  • I am using a fork to eat a soup
  • I am using money bills to fire the coals for BBQ
  • I am using a kettle to boil eggs.
  • I am using a microscope to hammer the nails.

Are there any disadvantages to what I'm doing?

YES

You are using a tool not on purpose. This is always a disadvantage.

A sane human being alway using a tool that is intended for the certain job. Not some randomly picked one. Especially if there is no shortage in the right tool supply.

URL encoding is not intended to be used with database, as one can tell from the name. That's alone reason enough for the sane developer. Take a look around: find the proper tool.

There is a thing called "common sense" - a thing widely used in the regular life but for some reason always absent in the php world.
A common sense can warn us: if we're using a wrong tool, it may spoil the work. Sooner or later it will spoil it. No need to ask for the certain details - it's a general rule. We are learning this rule at about age of 5.

Why not to use it while playing with some web thingies too?

Why not to ask yourself a question:

What's wrong with storing foreign characters at all?

urlencode makes stroing foreign characters very simple

Any hardships you encountered without urlencode?

Although I feel that common sense should be enough to answer the question, people always look for the "omen", the proof. Here you are:

Database's job is not limited to just storing and retrieving data. A plain text file can handle such a primitive task as well.
Data manipulations is what we are using databases for.
Most widely used ones are sorting and filtering.

  • Such a quite intelligent thing as a database can sort and filter data character-insensitive, which is very handy feature. But of course it can be done only if characters being saved as is, not as some random codes.
  • Sorting texts also may use ordering other than just binary order in the character table. Some umlaut characters may be present at the other parts of the table but database collation will put them in the right place. Of course it can be done only if characters being saved as is, not as some random codes.
  • Sometimes we have to manipulate the data that already stored in the database. Say, cut some piece from the string and compare with the entered value. How it is supposed to be done with urlencoded data?

Upvotes: 0

Jon
Jon

Reputation: 437774

Sure there are.

Let's start with the practical: for a large class of characters you are spending 3 bytes of storage for every byte of data. The description of rawurlencode (and of course the RFC) say that those characters are

all non-alphanumeric characters except -_.~

This means that there is a total of 26 + 26 + 10 (alphanumeric) + 4 (special exceptions) = 66 characters for which you do not waste space.

Then there are also the logical drawbacks: You are not storing the data itself, but rather a representation of the data tailored to URLs. Unless the data itself is URLs, that's not what you should be doing.

Upvotes: 1

Related Questions