Introduction
Representing data in XML is a rage these days, specially after the success of J2EE platform a decade ago and distributed computing using CORBA and RMI fell out of fashion and made way for Web Services and XML was identified to be the data transport format in the guise of SOAP. Well, for the other part of the world which works with embedded systems the binary transport is the rule of the day. For someone like me who walked into embedded computing from the world of enterprise application development this is a bit strange sometimes and interesting the other times.
Well, the situation described here is depicted in the figure below:
I have some magic in the smart card (MiFARE DESFire) shown in the picture and is being read by the reader which is connected to by PC via a USB cable. I have an application on my PC that can talk to this reader and exchange some data with the smart card. In the current article, I am only interested in the data format that is used in exchange of this data. The actual contents of the data and what can be done with it are dealt with later.
ASN.1 is an abstract scheme of data encoding - it simply defines primitives, composites and structures of data. Suppose some payment specific data can be defined as follows:
TRACK1 :: = SEQUENCE
{
PermanentAccoutNumber INTEGER,
CardHolderName IA5String,
ExpirationDate DATE,
ServiceCode INTEGER
}
Now, this is just an abstract representation of how the that MUST be. But how the data actually encoded is defined by various encoding schemes such as BER,CER,DER,XER etc. The relation ASN.1 has with each of the encoding schemes mentioned just now is akin to Abstract Class (ASN.1) and Concrete Class. If you intend to have a better grasp on ASN.1, nothing helps you better than this reference. Now we move on to the topic of TLV data encoding.
TRACK1 :: = SEQUENCE
{
PermanentAccoutNumber INTEGER,
CardHolderName IA5String,
ExpirationDate DATE,
ServiceCode INTEGER
}
Now, this is just an abstract representation of how the that MUST be. But how the data actually encoded is defined by various encoding schemes such as BER,CER,DER,XER etc. The relation ASN.1 has with each of the encoding schemes mentioned just now is akin to Abstract Class (ASN.1) and Concrete Class. If you intend to have a better grasp on ASN.1, nothing helps you better than this reference. Now we move on to the topic of TLV data encoding.
T[ag] L[ength] V[alue] Structure
In this encoding, the data is represented as follows:
Every piece of data is identified by it's representing tag, followed by the length of the data (value) and then, the data payload (value) itself. This definition is recursive in sense, that the value may again contain data that is in TLV structure.
ASN.1 instances are coded in TLV. Every data object consists of a Tag, Length and a Value. A tag defines if the object is an integer, boolean, a structure or something else. For example, a vocabulary might be defined such as:
30: Sequence
0C: UTF-8 string
02: Integer
01: Boolean
Now, it's easy to derive the rules of encoding any data in TLV format as specified in ISO 7816-4. Suppose we have a bit representation b8b7b6b5b4b3b2b1 of one byte, we can derive some of the rules from the above color coded representation:
Till now we have not addressed the pink and orange coded bits that identify the class of a tag and nature of it's structure.
Class can be summarized as:
Every piece of data is identified by it's representing tag, followed by the length of the data (value) and then, the data payload (value) itself. This definition is recursive in sense, that the value may again contain data that is in TLV structure.
30: Sequence
0C: UTF-8 string
02: Integer
01: Boolean
The above is used for illustration purposes only. As an industry/ specific industry can create a vocabulary of tags such as {0x30, 0x0C, 0x02,0x01} which are understood by everyone in that specific industry. However, for consistency, ISO 7816-4 defines how tags MAY be defined by a specific industry in set of rules. Such rules define the following:
- How is a single byte tag defined
- How is a multi-byte tag defined and identified
- How is length coded on one byte
- How is length coded on multiple bytes
- How is a tag whose meaning is defined to be understood universally?
- How is a tag whose meaning is defined to be understood for a specific application
- How is a tag that has a meaning only within a specific context
- How is a tag that is privately understood. Other applications need not understand such tags
The answers to above questions can be conveniently answered visually in the following two figures shown, one illustrating tag encoding and another value length encoding. The color coding is :
- Class of a Tag
- Is the tag Primitive or Constructed
- Value of a Tag
- Bits are turned on
- Bits are turned off
BER (Basic Encoding Rules), Tag Encoding
8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
1 | 1 | 1 | 1 | 1 | |||
8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
BER (Basic Encoding Rules), LengthEncoding
8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
0 |
8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
Now, it's easy to derive the rules of encoding any data in TLV format as specified in ISO 7816-4. Suppose we have a bit representation b8b7b6b5b4b3b2b1 of one byte, we can derive some of the rules from the above color coded representation:
- When tag is encoded on one byte, it is in the form xxxb5b4b3b2b1 where last 5 bits denote the tag value; x could be either 0,1
- When tag is encoded on two bytes, it is in the form xxx11111 b8b7b6b5b4b3b2b1 , where class type is on first byte and the second byte contains value; x could be either 0,1
- If the length of value (data) is < 128, it can be encoded on 1 byte 0b7b6b5b4b3b2b1. If first byte is turned off, then the length is of one byte
- If the length of value (data) is between 128-255, then length is encoded on 2 bytes and is in the form 10000001 b8b7b6b5b4b3b2b1. If the MSB is not 0, then the length >127.
- If the length of the value (data) is between 256-65535, then length is encoded on 3 bytes and is in the form 10000010 b8b7b6b5b4b3b2b1 b8b7b6b5b4b3b2b1 .
Tag | Representation | Classification | Purpose |
---|---|---|---|
A4 | 10100100 | Single Byte Tag | Select Command |
4F | 01001111 | Single Byte Tag | Applet AID |
5F34 | 01011111 00110100 | 2 Byte Tag |
Till now we have not addressed the pink and orange coded bits that identify the class of a tag and nature of it's structure.
- Class of a Tag, coded in pink identifies the usage context of the tag - such as Universal usage i.e. can be used anywhere, Payment industry usage, Airline ticketing usage etc.
- Nature of the structure of tag is coded in orange, which speaks something about the data - is it a primitive (such as INTEGER, BOOLEAN, STRING etc) or a composite structure. If the data is composite, it is called tag is constructed.
Class can be summarized as:
- 00b6b5b4b3b2b1 is a Universal Class ( INTEGER, BOOLEAN, STRING etc). You can read about them quickly here
- 01b6b5b4b3b2b1 is an Application Class, which are used by a specific industry. For example, all tags used by Payment industry (EMV) can be seen here
- 10b6b5b4b3b2b1 is a Context Specific class, used within the context of an application
- 11b6b5b4b3b2b1 is a Private Class, not exposed to entities outside the application scope.