Matt Ellen
Matt Ellen

Reputation: 11612

What does %d mean in struct.pack?

I was reading though a library of python code, and I'm stumped by this statement:

struct.pack( "<ii%ds"%len(value), ParameterTypes.String, len(value), value.encode("UTF8") )

I understand everything but%d, and I'm not sure why the length of value is being packed in twice.

As I understand it, the structure will have little endian encoding (<) and will contain two integers (ii) followed by %d, followed by a string (s).

What is the significance of %d?

Upvotes: 2

Views: 3399

Answers (5)

Brendan
Brendan

Reputation: 19413

It is an ordinary string format which is being used to create the struct format

Try reading it to begin with as an ordinary string (forget struct for the moment) ...

"<ii%ds" % len(value)

If, for example, the length of the value iterable is 4 then the string will be, <ii4s. This is then passed to struct.pack ready to pack two integers followed by a string of length four bytes from the value iterable

Upvotes: 1

S.Lott
S.Lott

Reputation: 392050

The %d means this works in two steps.

Step 1.

"<ii%ds"%len(value) 

Creates the struct formatting string of "<ii...some number...s".

Step 2.

The resulting formatting string is applied to three values

ParameterTypes.String, len(value), value.encode("UTF8")

Upvotes: 0

John Machin
John Machin

Reputation: 83032

Aarrrgh the mind boggles ....

@S.Lott: """I don't think the number is particularly important, since Python will tend to pack correctly without it.""" -1. Don't think; investigate. Without a number means merely that the number defaults to 1. Tends to pack correctly??? Perhaps you think that struct.pack("s", foo) works the same way as "%s" % foo? It doesn't; docs say """For the 's' format character, the count is interpreted as the size of the string, not a repeat count like for the other format characters; for example, '10s' means a single 10-byte string, while '10c' means 10 characters. For packing, the string is truncated or padded with null bytes as appropriate to make it fit."""

@Brendan: -1. value is not an array (whatever that is); it is patently obviously intended to be a unicode string ... lookee here: value.encode("UTF8")

@Matt Ellen: The line of code that you quote is severely broken. If there are any non-ASCII characters in value, data will be lost.

Let's break it down:

`struct.pack("<ii%ds"%len(value), ParameterTypes.String, len(value), value.encode("UTF8"))`  

Reduce problem space by removing the first item

struct.pack("<i%ds"%len(value), len(value), value.encode("UTF8"))

Now let's suppose that value is u'\xff\xff', so len(value) is 2.

Let v8 = value.encode('UTF8') i.e. '\xc3\xbf\xc3\xbf'.

Note that len(v8) is 4. Is the penny dropping yet?

So what we now have is

struct.pack("<i2s", 2, v8)

The number 2 is packed as 4 bytes, 02 00 00 00. The 4-byte string v8 is TRUNCATED (by the length 2 in "2s") to length two. DATA LOSS. FAIL.

The correct way to do what is presumably wanted is:

v8 = value.encode('UTF8')
struct.pack("<ii%ds" % len(v8), ParameterTypes.String, len(v8), v8)

Upvotes: 2

Andrew
Andrew

Reputation: 13191

The significance of %d is that it's a formatting parameter for strings:
String Formatting Operations

When broken apart, "<ii%ds" % len(value) is a bit easier to understand. It is replacing the %d conversion indicator in the string with the return value of len(value), typecast appropriately.

>>> str = "<ii%ds"
>>> str % 5
'<ii5s'
>>> str % 3
'<ii3s'

Upvotes: 1

Michael F
Michael F

Reputation: 40879

It's used to specify that a string (value) of len(value) characters is to be packed after those two integers.

If, for instance, value contained "boo" then the actual format specifier for pack would be "<ii3s".

Upvotes: 0

Related Questions