newspaint

Documenting Problems That Were Difficult To Find The Answer To

Converting Raw Bytes Into Int32, Int16, UInt32, UInt16

If you find yourself processing raw bytes and want to decode signed and unsigned integers then the following routines will help.

Network-Ordered (Big Endian) Encoded Numbers

Note that the following code is for network-ordered (big endian) numbers.

In order to verify the routines coded are correct we can specify tests (test-driven development) to check they do what we expect – particularly at the boundary conditions:

use Test::More;

is( str_to_uint32( "\x00\x00\x00\x01" ),          1, "uint32 1" );
is( str_to_uint32( "\xFF\xFF\xFF\xFF" ), 4294967295, "uint32 4294967295" );
is( str_to_uint16( "\x00\x01" ),                  1, "uint16 1" );
is( str_to_uint16( "\xFF\xFF" ),              65535, "uint16 65535" );
is( str_to_int32( "\x00\x00\x00\x01" ),           1, "int32 1" );
is( str_to_int32( "\xFF\xFF\xFF\xFF" ),          -1, "int32 -1" );
is( str_to_int32( "\x7F\xFF\xFF\xFF" ),  2147483647, "int32 2147483647" );
is( str_to_int32( "\x80\x00\x00\x00" ), -2147483648, "int32 -2147483648" );
is( str_to_int16( "\x00\x01" ),                   1, "int16 1" );
is( str_to_int16( "\xFF\xFF" ),                  -1, "int16 -1" );
is( str_to_int16( "\x7F\xFF" ),               32767, "int16 32767" );
is( str_to_int16( "\x80\x00" ),              -32768, "int16 -32768" );

Decoding network-ordered unsigned integers is very easy in Perl with the unpack() function. For documentation see the pack() function and perlpacktut tutorial pages.

sub str_to_uint32 {
  return unpack( "N", $_[0] ); # "N" for "Network" order (big-endian)
}

sub str_to_uint16 {
  return unpack( "n", $_[0] ); # "n" for "Network" order (big-endian)
}

Decoding network-ordered signed integers is slightly more difficult as the pack() function does not appear to directly support such encodings. Instead we can convert a decoded unsigned representation and, if negative, apply twos’ compliment. From the Wikipedia page:

Conveniently, another way of finding the two’s complement of a number is to take its ones’ complement and add one.

The corollary is that to decode a negative number we take the ones’ compliment and subtract one. We use the pragma use integer; to ensure that bit flipping is done on integral values, not floating-point.

sub str_to_int32 {
  use integer;

  my $num = str_to_uint32( $_[0] );
  if ( $num & 0x80000000 ) {
    $num = 0 - ( ( ( ~ $num ) & 0xFFFFFFFF ) + 1 );
  }

  return $num;
}

sub str_to_int16 {
  use integer;

  my $num = str_to_uint16( $_[0] );
  if ( $num & 0x8000 ) {
    $num = 0 - ( ( ( ~ $num ) & 0xFFFF ) + 1 );
  }

  return $num;
}

Upon running our tests we get the following output:

me@myhost:~ $ perl -w test_functions.pl
ok 1 - uint32 1
ok 2 - uint32 4294967295
ok 3 - uint16 1
ok 4 - uint16 65535
ok 5 - int32 1
ok 6 - int32 -1
ok 7 - int32 2147483647
ok 8 - int32 -2147483648
ok 9 - int16 1
ok 10 - int16 -1
ok 11 - int16 32767
ok 12 - int16 -32768

Little Endian Encoded Numbers

The same routines can be used with a minor differences. Firstly setting up our tests:

use Test::More;

is( str_to_uint32( "\x01\x00\x00\x00" ),          1, "uint32 1" );
is( str_to_uint32( "\xFF\xFF\xFF\xFF" ), 4294967295, "uint32 4294967295" );
is( str_to_uint16( "\x01\x00" ),                  1, "uint16 1" );
is( str_to_uint16( "\xFF\xFF" ),              65535, "uint16 65535" );
is( str_to_int32( "\x01\x00\x00\x00" ),           1, "int32 1" );
is( str_to_int32( "\xFF\xFF\xFF\xFF" ),          -1, "int32 -1" );
is( str_to_int32( "\xFF\xFF\xFF\x7F" ),  2147483647, "int32 2147483647" );
is( str_to_int32( "\x00\x00\x00\x80" ), -2147483648, "int32 -2147483648" );
is( str_to_int16( "\x01\x00" ),                   1, "int16 1" );
is( str_to_int16( "\xFF\xFF" ),                  -1, "int16 -1" );
is( str_to_int16( "\xFF\x7F" ),               32767, "int16 32767" );
is( str_to_int16( "\x00\x80" ),              -32768, "int16 -32768" );

All we have to do is simply change two functions:

sub str_to_uint32 {
  return unpack( "V", $_[0] ); # "V" for "VAX" order (little-endian)
}

sub str_to_uint16 {
  return unpack( "v", $_[0] ); # "v" for "VAX" order (little-endian)
}

Upon running our tests we get the following output:

me@myhost:~ $ perl -w test_functions.pl
ok 1 - uint32 1
ok 2 - uint32 4294967295
ok 3 - uint16 1
ok 4 - uint16 65535
ok 5 - int32 1
ok 6 - int32 -1
ok 7 - int32 2147483647
ok 8 - int32 -2147483648
ok 9 - int16 1
ok 10 - int16 -1
ok 11 - int16 32767
ok 12 - int16 -32768

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: