Thursday, August 9, 2012

TLV Encoding of Data

Introduction

Representing data in XML is a rage these days, specially after the success of J2EE platform a decade ago and distributed computing using CORBA and RMI fell out of fashion and made way for Web Services and XML was identified to be the data transport format in the guise of SOAP. Well, for the other part of the world which works with embedded systems the binary transport is the rule of the day. For someone like me who walked into embedded computing from the world of enterprise application development this is a bit strange sometimes and interesting the other times.

Well, the situation described here is depicted in the figure below:


I have some magic in the smart card (MiFARE DESFire) shown in the picture and is being read by the reader which is connected to by PC via a USB cable. I have an application on my PC that can talk to this reader and exchange some data with the smart card. In the current article, I am only interested in the data format that is used in exchange of this data. The actual contents of the data and what can be done with it are dealt with later.

ASN.1 is an abstract scheme of data encoding - it simply defines primitives, composites and structures of data. Suppose some payment specific data can be defined as follows:

TRACK1 :: = SEQUENCE
{
   PermanentAccoutNumber INTEGER,
   CardHolderName IA5String,
   ExpirationDate DATE,
   ServiceCode INTEGER
}
Now, this is just an abstract representation of how the that MUST be. But how the data actually encoded is defined by various encoding schemes such as BER,CER,DER,XER etc. The relation ASN.1 has with each of the encoding schemes mentioned just now is akin to Abstract Class (ASN.1) and Concrete Class. If you intend to have a better grasp on ASN.1, nothing helps you better than this reference. Now we move on to the topic of TLV data encoding.

T[ag] L[ength] V[alue] Structure

In this encoding, the data is represented as follows:
Every piece of data is identified by it's representing tag, followed by the length of the data (value) and then, the data payload (value) itself. This definition is recursive in sense, that the value may again contain data that is in TLV structure.

ASN.1 instances are coded in TLV. Every data object consists of a Tag, Length and a Value. A tag defines if the object is an integer, boolean, a structure or something else. For example, a vocabulary might be defined such as:
30: Sequence
0C: UTF-8 string
02: Integer
01: Boolean

The above is used for illustration purposes only. As an industry/ specific industry can create a vocabulary of tags such as {0x30, 0x0C, 0x02,0x01} which are understood by everyone in that specific industry. However, for consistency, ISO 7816-4 defines how tags MAY be defined by a specific industry in set of rules. Such rules define the following:
  • How is a single byte tag defined
  • How is a multi-byte tag defined and identified
  • How is length coded on one byte
  • How is length coded on multiple bytes
And others such as
  • How is a tag whose meaning is defined to be understood universally?
  • How is a tag whose meaning is defined to be understood for a specific application
  • How is a tag that has a meaning only within a specific context
  • How is a tag that is privately understood. Other applications need not understand such tags
The answers to above questions can be conveniently answered visually in the following two figures shown, one illustrating tag encoding and another value length encoding. The color coding is :
  1. Class of a Tag
  2. Is the tag Primitive or Constructed
  3. Value of a Tag
  4. Bits are turned on
  5. Bits are turned off

BER (Basic Encoding Rules), Tag Encoding


Bits in one byte tag
8 7 6 5 4 3 2 1


Bits in two byte tag
8 7 6 5 4 3 2 1
1 1 1 1 1
8 7 6 5 4 3 2 1


BER (Basic Encoding Rules), LengthEncoding


Encoding of length (0-127), 1-byte
8 7 6 5 4 3 2 1
0


Encoding of length (128-255), 2-byte
8 7 6 5 4 3 2 1
1 0 0 0 0 0 0 1
8 7 6 5 4 3 2 1


Encoding of length (256-65535), 3-byte
8 7 6 5 4 3 2 1
1 0 0 0 0 0 1 0
8 7 6 5 4 3 2 1
8 7 6 5 4 3 2 1

Now, it's easy to derive the rules of encoding any data in TLV format as specified in ISO 7816-4. Suppose we have a bit representation  b8b7b6b5b4b3b2b1 of one byte, we can derive some of the rules from the above color coded representation:
  1. When tag is encoded on one byte, it is in the form xxxb5b4b3b2b1  where last 5 bits denote the tag value; x could be either 0,1
  2. When tag is encoded on two bytes, it is in the form xxx11111  b8b7b6b5b4b3b2b1  , where class type is on first byte and the second byte contains value; x could be either 0,1
  3. If the length of value (data) is < 128, it can be encoded on 1 byte  0b7b6b5b4b3b2b1. If first byte is turned off, then the length is of one byte
  4. If the length of value (data) is between 128-255, then length is encoded on 2 bytes and is in the form 10000001  b8b7b6b5b4b3b2b1. If the MSB is not 0, then the length >127.
  5. If the length of the value (data) is between 256-65535, then length is encoded on 3 bytes and is in the form  10000010  b8b7b6b5b4b3b2b1  b8b7b6b5b4b3b2b1 .
We will illustrate these rules with some concrete examples.
Tag Representation Classification Purpose
A4 10100100 Single Byte Tag Select Command
4F 01001111 Single Byte Tag Applet AID
5F34 01011111 00110100 2 Byte Tag

Till now we have not addressed the pink  and orange coded bits that identify the class of a tag and nature of it's structure. 

  • Class of a Tag, coded in pink identifies the usage context of the tag - such as Universal usage i.e. can be used anywhere, Payment industry usage, Airline ticketing usage etc.
  • Nature of the structure of tag is coded in orange, which speaks something about the data - is it a primitive (such as INTEGER, BOOLEAN, STRING etc) or a composite structure. If the data is composite, it is called tag is constructed.

Class can be summarized as:

  1. 00b6b5b4b3b2bis a Universal Class ( INTEGER, BOOLEAN, STRING etc). You can read about them quickly here
  2. 01b6b5b4b3b2b1 is an Application Class, which are used by a specific industry. For example, all tags used by Payment industry (EMV) can be seen here
  3. 10b6b5b4b3b2bis a Context Specific class, used within the context of an application
  4. 11b6b5b4b3b2b1 is a Private Class, not exposed to entities outside the application scope.