Mehdi Yeganeh
Mehdi Yeganeh

Reputation: 2141

Parsing SQLite Database Schema in sqlitedb file?

I wrote a program for parsing SQLite file, i can parse all data from b-tree pages to record, column & values but i need to parse schema of tables, i found something like database schema stored in page 1 (root page) and i can see it with Hex Editor, and i found structure of sqlite_master, i read it exact as explain in http://sqlite.org/fileformat2.html

I want to know how can i found the first byte of sqlite_master table in db file, how can i detect starting byte of schema? is there anything related in SQLite DB Header?

Edit 1 (more info):

For example: i opened sqlite db with hex editor, (if you check my page size is 4096 bytes and i marked page header in image): enter image description here

i marked root page header that start with 05 means the page is an interior table b-tree page and please check B-tree Page Header Format (http://sqlite.org/fileformat2.html) and its have 5 cells that you can see it with this cell pointers array: 0FFB, 0FF6, 0FF1, 0FEC, 0FE7 (that start after ending header) and all cells have 5 bytes and start from 0FE7 then the schema that you can see it in picture ( in text part ) start from 232~240 and i check other dbs and schema in different place...

Edit 2:

You can download Example File from https://www.dropbox.com/s/lanky02kneyb74w/31bb7ba8914766d4ba40d6dfb6113c8b614be442

Edit 3:

In my file you can see

$ hexdump -C 31bb7ba8914766d4ba40d6dfb6113c8b614be442

00000000  53 51 4c 69 74 65 20 66  6f 72 6d 61 74 20 33 00  |SQLite format 3.|
00000010  10 00 02 02 00 40 20 20  00 00 00 02 00 00 00 3f  |.....@  .......?|
00000020  00 00 00 00 00 00 00 00  00 00 00 47 00 00 00 04  |...........G....|
00000030  00 00 00 00 00 00 00 00  00 00 00 01 00 00 00 00  |................|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 02  |................|
00000060  00 2d e2 25 05 00 00 00  05 0f e7 00 00 00 00 3d  |.-.%...........=|
00000070  0f fb 0f f6 0f f1 0f ec  0f e7 08 7f 07 9d 08 3c  |...............<|
00000080  07 01 06 22 05 92 04 fe  03 fc 04 c1 03 4d 02 b8  |...".........M..|
00000090  02 0a 02 75 01 32 01 c7  00 e9 00 e9 00 00 00 00  |...u.2..........|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000e0  00 00 00 00 00 00 00 00  00 47 18 06 17 5b 35 01  |.........G...[5.|
000000f0  00 69 6e 64 65 78 73 71  6c 69 74 65 5f 61 75 74  |.indexsqlite_aut|
00000100  6f 69 6e 64 65 78 5f 41  42 4d 75 6c 74 69 56 61  |oindex_ABMultiVa|
00000110  6c 75 65 45 6e 74 72 79  4b 65 79 5f 31 41 42 4d  |lueEntryKey_1ABM|
00000120  75 6c 74 69 56 61 6c 75  65 45 6e 74 72 79 4b 65  |ultiValueEntryKe|

Page Header ( offset 64)

05          <- interior table b-tree page
0000        <- Byte offset into the page of the first freeblock
0005        <- Number of cells on this page
0FE7        <- Offset to the first byte of the cell content area
00          <- Number of fragmented free bytes
0000003D    (61) <- The right-most pointer

Cell Array Pointers & Cell Contents:
(Table Interior Cell Format)

Cell Pointer| Page number of left child | Rowid
------------|---------------------------|-------
0FFB        | 0000001A      (26)        | 15
0FF6        | 0000001C      (28)        | 2D
0FF1        | 00000031      (49)        | 3C
0FEC        | 00000039      (57)        | 48
0FE7        | 0000003C      (60)        | 4C     <- equal to (Offset to the first byte of the cell content area) in page header

Upvotes: 4

Views: 5896

Answers (2)

jjtx
jjtx

Reputation: 41

I realize your question was asked over a year ago and you probably resolved it, but I would like to submit an answer in case anyone else has this same question. I was in the same situation as you, Mehdi. I wanted to read a SQLite database file, and was looking for the master table / schema. It appeared to be in page 1, but the header was not pointing to it. There were two reasons for my confusion.

(1) There was a lot of "dead" data in my SQLite database file that was not being used. I believe as the database was created and grew, the location of the actual active data moved, and the old location was not overwritten with zeros. Doing a search for some of the "CREATE TABLE" statements found multiple results in different locations of the file. I later determined the actual schema was split up and located on pages 18, 10, and 8 (which the page 1 interior table pointed to). I would have detected this earlier, if not for reason #2.

(2) I had miscalculated the byte position of the page number, which confused me. Where p = page #, and s = page size, I thought it was [p * s] .... but actually it's [(p-1) * s] (except for page 1 which starts at byte 100). In other words, I thought the page numbering started at zero instead of 1.

As an additional note, I believe the http://sqlite.org/fileformat2.html page is missing some vital info. Specifically, it doesn't explain where the "root page" number is in the schema table (it's in field 4). I couldn't find this information on the sqlite.org page.

Upvotes: 4

CL.
CL.

Reputation: 180280

The documentation you linked to says in section 2.6:

Page 1 of a database file is the root page of a table b-tree that holds a special table named "sqlite_master"

and in section 1.5:

A b-tree page is divided into regions in the following order:

  1. The 100-byte database file header (found on page 1 only)
  2. The 8 or 12 byte b-tree page header …

For example, with this database:

$ sqlite3 test.db "create table hello(world);"
$ hexdump -C test.db 
00000000  53 51 4c 69 74 65 20 66  6f 72 6d 61 74 20 33 00  |SQLite format 3.|
00000010  04 00 01 01 00 40 20 20  00 00 00 01 00 00 00 02  |.....@  ........|
00000020  00 00 00 00 00 00 00 00  00 00 00 01 00 00 00 04  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 01 00 00 00 00  |................|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 01  |................|
00000060  00 2d e6 03 0d 00 00 00  01 03 cf 00 03 cf 00 00  |.-æ.......Ï..Ï..|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000003c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 2f  |.............../|
000003d0  01 06 17 17 17 01 3f 74  61 62 6c 65 68 65 6c 6c  |......?tablehell|
000003e0  6f 68 65 6c 6c 6f 02 43  52 45 41 54 45 20 54 41  |ohello.CREATE TA|
000003f0  42 4c 45 20 68 65 6c 6c  6f 28 77 6f 72 6c 64 29  |BLE hello(world)|
00000400  0d 00 00 00 00 04 00 00  00 00 00 00 00 00 00 00  |................|
00000410  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

... the page header at offset 0x64 has these values:

  • 0d: page is a leaf table b-tree page
  • 0000: freeblock offset
  • 0001: number of cells
  • 03cf: offset of cell content
  • 00: fragmented free bytes
  • 03cf: first cell pointer

And at offset 3cf, you have a standard table b-tree leaf cell, containing the only row of the sqlite_master table:

sqlite> select * from sqlite_master;
type        name        tbl_name    rootpage    sql                      
----------  ----------  ----------  ----------  -------------------------
table       hello       hello       2           CREATE TABLE hello(world)

Upvotes: 2

Related Questions