Information and Links

Join the fray by commenting, tracking what others have to say, or linking to it from your blog.


Other Posts
Distinguished Fields vs. Property Fields
Message Schema Resolution - The Simple Approach

Beyond the Flat File Wizard in BTS 2006 - Part 1

Posted by Mario Harik on December 6th, 2006

I was recently helping one of our clients parse a record-based flat file where the records are mostly positional identified by a tag. The issue was that the flat file was a credit card transaction log where each record, identified by a tag, is a sibling to the other records with no specific order in which the records appear. Such a scenario cannot be handled by an “out of the box”  usage of the FF wizard. This series of postings will focus on explaining how BizTalk Flat File schemas are structured, analyze the output of the FF wizard and how to customize it.

BizTalk handles Flat Files through the use of flat file extensions defined within schema annotations (more about annotations…, more about flat file extensions…). A receive pipeline component known as a Flat File Disassembler (or assembler for the send counterpart) can act on these extensions to parse an incoming flat file into XML. This disassembler is one of the standard components packaged with BizTalk and can be found in the Microsoft.BizTalk.Pipeline.Components.dll assembly under “C:\Program Files\Microsoft BizTalk Server 2006\Pipeline Components\” (for a default installation). The FF assembler and disassembler components in this assembly are FFDasmComp and FFAsmComp.

Although the Flat File wizard is a really nice new feature of BTS 06, it does make it easier for developers to get by without understanding the different options available with FF extensions to meet their needs. This series of posts will be example driven to explain what the wizard generates and go over some common flat file scenarios where the developer will need to update the schema properties vs. using the output of the wizard.

Example 1: A Common CSV File
The first example is a simple comma separated file of contacts that has three fields: First Name, Last Name and Age. The following is what a sample Contacts.csv file looks like:

    1 Mario, Harik, 32

    2 Foo , Bar , 12

    3 John , Dough, 47

 

Creating a flat file schema from this file using the wizard is done in two steps, the first one is to specify the record delimiter of new line (”{CR}{LF}”) with repeating records for the root element and the second is to specify each of the record’s delimiter of comma (”,”). The following are the essential steps of the creation process:

  1. Add a new Flat File Wizard Schema to your project.
  2. Select the flat file you want to create a schema for (contacts.csv in this example) as your instance.
  3. Give the root node a name of Contacts and assign the proper namespace based on your naming convention.
  4. Enter your record delimiter of data under the root node which is {CR}{LF} in this case.
  5. The wizard will then allows you to specify the child elements of the selected data based on the specified delimiter. Our first example is a simple one where it contains a set of identical repeating records.
  6. The wizard will then reiterate over all the records or repeating records to define how they will be parsed.
  7. In this example, the second iteration will define the contact record on the contacts schema. This time however, select “,” (comma) as being the delimiter.
  8. Enter the child elements information as shown below.
  9. And this would create the flat file schema that will be used by the flat file disassembler or assembler to be able to produce an XML document out of the contacts CSV file or vice versa. Before we start looking into non-wizard supported FF customizations, we’ll first analyze the output of the wizard for this example.

The schema generated by the FF wizard looks as follows:

The first thing we notice in this schema are the schema-level annotations that specify that we’re using the flat file extensions:

    3 <xs:annotation>

    4     <xs:appinfo>

    5       <schemaEditorExtension:schemaInfo namespaceAlias=b extensionClass=Microsoft.BizTalk.FlatFileExtension.FlatFileExtension standardName=Flat File xmlns:schemaEditorExtension=http://schemas.microsoft.com/BizTalk/   2003/SchemaEditorExtensions />

    6       <b:schemaInfo standard=Flat File codepage=1200 default_pad_char= pad_char_type=char count_positions_by_byte=false parser_optimization=speed lookahead_depth=3 suppress_empty_nodes=false generate_empty_nodes=true allow_early_termination=false early_terminate_optional_fields=false allow_message_breakup_of_infix_root=false compile_parse_tables=false root_reference=Contacts />

    7     </xs:appinfo>

    8   </xs:annotation>  

 

Underneath the root node, the flat file wizard added the following annotations:

Underneath the root node, the flat file wizard added the following annotations:

   11 <xs:appinfo>

   12         <b:recordInfo structure=delimited child_delimiter_type=hex child_delimiter=0xD 0xA child_order=postfix sequence_number=1 preserve_delimiter_for_empty_data=true suppress_trailing_delimiters=false />

   13       </xs:appinfo>

 

These specify that the structure is delimited (vs. positional) where the child_delimiter_type is of type hex (vs. char).  The carriage return and line feed are represented with their hex ascii representation of 13 (hex=D) and 10 (hex=A). When specifying hex delimiters, they have to be separated with a space and prefixed with 0x.

These annotations will instruct the FF assembler or disassembler to map one Contact element to a line in the corresponding flat file.

The instructions on how to interpret the elements within each Contact are specified on the Contact node with the following annotations:

   24 <xs:appinfo>

   25               <b:recordInfo structure=delimited child_delimiter_type=char child_delimiter=, child_order=infix sequence_number=1 preserve_delimiter_for_empty_data=true suppress_trailing_delimiters=false />

   26             </xs:appinfo>

 

These elements are also delimited but using commas instead of carriage return and line feed.

Using these default settings, and parsing the incoming CSV file, the output XML file is as follows:

    1 <Contacts xmlns=http://Tallan.BizTalk.SQLAdapterUpdategrams.Contacts_FF>

    2     <Contact xmlns=“”>

    3         <FirstName>Mario</FirstName>

    4         <LastName>Harik</LastName>

    5         <Age>32</Age>

    6     </Contact>

    7     <Contact xmlns=“”>

    8         <FirstName>Foo</FirstName>

    9         <LastName>Bar</LastName>

   10         <Age>12</Age>

   11     </Contact>

   12     <Contact xmlns=“”>

   13         <FirstName>John</FirstName>

   14         <LastName>Dough</LastName>

   15         <Age>47</Age>

   16     </Contact>

   17 </Contacts>

 

Simple Customizations
Character Wrapping or Escape Delimiters

In this example, it is unlikely for delimiters to appear within the boundaries of one of the Contact elements but this is a common scenario in CSV files. These are usually dealt with using wrapping characters or escape characters; for example:
“This is my, first name”, last name, 45
or
This is my \, first name, last name, 45

With our wizard generated schema, the following flat file:

    1 “Ma,rio”, Har\,ik, 32

 

results in the following XML:

    2 <Contact xmlns=“”>

    3   <FirstName>”Ma</FirstName>

    4   <LastName>rio”</LastName>

    5   <Age>Har\,ik, 32</Age>

    6   </Contact>

    7 </Contacts>

 

Which is not the desired result. Adding a wrap character to the field’s annotation can take care of nested commas; this is done on the element that needs to support wrapping (in this case, we add it to the FirstName element’s annotation):

   35 <xs:element name=FirstName type=xs:string>

   36                 <xs:annotation>

   37                   <xs:appinfo>

   38                     <b:fieldInfo justification=left sequence_number=1 wrap_char_type=char wrap_char= />

   39                   </xs:appinfo>

   40                 </xs:annotation>

   41               </xs:element>

 

Specifying an escape character such as a “\” can be done on the record info annotation:

   24 <xs:appinfo>

   25               <b:recordInfo structure=delimited child_delimiter_type=char child_delimiter=, child_order=infix sequence_number=1 preserve_delimiter_for_empty_data=true suppress_trailing_delimiters=false escape_char_type=char escape_char=\ />

   26             </xs:appinfo>

 

These settings will then cause the line shown above to be parsed as:

    1 <Contact xmlns=“”>

    2     <FirstName>Ma,rio</FirstName>

    3     <LastName>Har,ik</LastName>

    4     <Age>32</Age>

    5 </Contact>

 

In the next post in this series, I’ll cover some of the other properties of flat files and how to use them to add further customizations than the one provided by the wizard.



Write a Comment

Take a moment to comment and tell us what you think. Some basic HTML is allowed for formatting.

Reader Comments

HI,

I would like to appreciate the solution that you have mentioned here and that has helped me a lot but there is a small query if you can help me in this I need to use biztalk but in my case instead of flat file I will be having a mdb(Access Database) file. So please can you give me a solution of how should I convert that in xml in biztalk

Hi Manuj,
You cannot get an Access MDB file into BizTalk using the flat file diassembler. You will either have to use a custom pipeline component or an adapter. The custom pipeline component can use ADO.NET to be able to query the MDB file for the data you want to build an XML message from. You can get an ODBC adapter from the Top XML BizTalk utilities that can be found at: http://www.topxml.com/biztalkutilities/. We’ll post a few entries about this in the near future on how to implement both approaches.