Parse Flat Files (VDA, CSV)

Admin

Updated April 24, 2023 09:31

Parsing flat files (CSV, VDA) with Flat Reader

Internally, translating a custom file with EDI Tools for .NET is a two-step process:

Identify and match each message in the flat file to an EDI template
For each message, transpose its contents to the matching EDI template

The blue path below depicts the translation of a custom flat file (CSV, VDA, etc.):

A flat file
Is processed through a Flat reader (EdiFabric)
To produce a list of .NET objects which are instances of EDI templates

EDI Tools for .NET translates custom flat files by implementing a DFS (depth-first-search) algorithm. Messages from the flat file are transposed into C# instances of the corresponding EDI template class.

FlatReader provides fast, non-cached, forward-only access to flat file data. FlatReader methods let you move through the messages contained in the flat file.

FlatReader implements IDisposable and should be disposed of either directly or indirectly.

FlatReader Reference

FlatReader can read flat files of any type, such as CSV, VDA, positional (incl. of varying length), delimited (including nested items with secondary delimiter), or a mixture of delimited and positional lines.

Flat File Delimiters

FlatReader can only read flat files that contain lines whereby a line is defined as:

A line is defined as a sequence of characters followed by a line feed ("\n"), a carriage return ("\r"), or a carriage return immediately followed by a line feed ("\r\n").

Properties of FlatReader

The properties of the class reflect the value of the current node, which is where the reader is positioned.

Item property indicates the current message.

Configuration of FlatReader

FlatReader can be configured with the following additional settings:

FlatReaderSettings Reference

Common Settings:

ContinueOnError - this is to force the parser to continue past an exception. By default, parsing stops when an exception is encountered.
Split- When a template is marked for splitting it will still be translated as if it had not been marked. To explicitly enable splitting set this to True.

Matching a message to an EDI template

Flat files can contain batches of top-level structures, called messages. For example, in VDA 4905, a flat-file can contain multiple delivery instructions, or a custom purchase order file can contain multiple purchase orders. Each of these different "messages" needs to be represented with an EDI template, to which FlatReader will parse the message.

Before a message could be parsed, the reader needs to identify the EDI template (or the C# class) that corresponds to that message. To do so, the reader must be able to identify that a new message begins by characters at the start of a line.

Let's take a look at the flat-file below. It contains two messages, purchase order and another one which we'll call "markers". The two messages are represented with two different templates:

PO1999-10-20
US        Alice Smith         123 Maple Street    Mill Valley    CA 90952
US        Robert Smith        8 Oak Avenue        Old Town       PA 95819
ITEMS,ITEM872-AA|Lawnmower|1|148.95|Confirm this is electric,ITEM926-AA|Baby Monitor|1|39.98|Confirm this is electric

H,1999-10-20
B,Alice,Smith
CorruptBodyTag,Robert,Smith
B,Robert,Smith,test
T,1999-10-21

The two messages are identified by the following tags:

Every line that begins with "PO" is the start of a new PO message
Every line that begins with "H," is the start of a new Markers message

It is up to the creator of the file to ensure that messages can be unambiguously identified and to communicate the exact rules on how to identify each message.

To tell the reader how to resolve each message to the correct template, you create a flat factory that returns a MessageContext object. So, for our two messages above, we can implement the following factory:

private static MessageContext FlatFactory(string segment)
{
    var id = segment.Substring(0, 2);
    switch (id)
    {
        case "PO":
	    return new MessageContext("PO", "Flat", mc => Assembly.Load(new AssemblyName("EdiFabric.Examples.FlatFile.Common")));
	case "H,":
	    return new MessageContext("Markers", "Flat", mc => Assembly.Load(new AssemblyName("EdiFabric.Examples.FlatFile.Common")));
    }

    return null;
}

MessageContext Reference

The MessageContext requires three parameters to be able to match a message to an EDI template by using the values from the MessageAttribute specified in the EDI template.

The name that matches the name in the MessageAttribute
The standard (usually Flat) that matches the standard in the MessageAttribute
The assembly where the template class is built into

The template for the PO message is defined as:

[Message("Flat", "PO")]
public class FlatPO : EdiMessage

The template for the Markers message is defined as:

[Message("Flat", "Markers")]
public class FlatMarkers : EdiMessage

The FlatReader searches for a matching C# class through all the classes in the .NET assembly configured in the third parameter.

Reader exceptions and partial parsing

There are two types of failures that the FlatReader can encounter - when the flat file can't be read at all, and when a message can't be parsed to its corresponding EDI template.

Flat file can't be parsed

In the event that the flat file is corrupt and can't be read at all, FlatReader does not throw exceptions but instead returns a ReaderErrorContext. No messages can be matched to EDI templates.

var readerErrors = ediItems.OfType<ReaderErrorContext>();
if (readerErrors.Any())
{
   //  The stream is corrupt. Reject it and report back to the sender
   foreach(var readerError in readerErrors)
   {
       //  Respond with the error context, 
       //  which contains the standard EDI error code and fault reason
       var error = readerError.MessageErrorContext.Flatten();
   }
}

ReaderErrorContext Reference

A message within the flat file can't be parsed

In this case, the message has been matched to an EDI template, however, it can't be parsed to its .NET POCO due to reaching any of these conditions:

An unrecognizable line (the line can be identified neither by length or tag)
An improperly positioned line (the line can't be positioned to any item in the template)
A line that can't be parsed (the line has more fields than specified in the EDI template)

Upon reaching any of the conditions above, the parsing of the flat file stops, and the ErrorContext property of EdiMessage (the base class of every .NET POCO) is populated with the relevant error details, and the HasErrors property of EdiMessage is set to true.

EdiMessage Reference

The HasErrors property of EdiMessage indicates if the message was parse without problems (HasErrors = false) or partially parsed (HasErrors = true).

FlatReader supports a Continue-On-Error mode which forces the parser to continue towards the end of the flat file regardless of any errors. To enable it, set the ContinueOnError parameter of the reader settings to true.

Read Flat Files

FlatReader only has a streaming mode, which needs to be executed in a loop like this:

Stream flatStream = File.OpenRead(@"C:\\Flat_PO.txt");
List<IEdiItem> items = new List<IEdiItem>();

using (StreamReader streamReader = new StreamReader(flatStream, Encoding.UTF8, true, 1024))
{
    using (var flatReader = new FlatReader(streamReader, FlatFactory))
    {
	while (flatReader.Read())
	{
	    items.Add(flatReader.Item);
	}
    }
}

Examples in GitHub:

Guidelines for creating EDI templates

Although the templates represent flat files and not EDI files, we'll keep calling them EDI templates for the sake of consistency as they are essentially the same thing.

What are EDI templates?

FlatReader supports multiple variants of flat files that contain lines defined as positional, delimited, or a mixture of both. This guideline explains how to represent the different types of lines in the EDI template.

How to identify lines?

Lines with tags

Lines that can be identified with tags must be defined as a segment with an ID equal to that of the tag, e.g. the following line:
```
PO2019-12-17
```
that can be identified with the tag PO, should be represented as:
```
[Segment("PO")]
public class Header
```

Lines without tags

Lines that can't be identified with tags, must be defined as a segment with an empty ID. These lines must have lines with tags before and after them, e.g. the following line:

US David Pastrnak 100 Legends Way Boston MA 02114

which has no flag as the first value can differ (country code), should be represented as:

[Segment("")]
public class Customer

and located between lines with tags:

[Message("Flat", "PO")]
public class FlatPO : EdiMessage
{
	[Required]
	[Pos(1)]
	public Header Header { get; set; }

	[Required]
	[Pos(2)]
	public List<Customer> Customers { get; set; }

	[Required]
	[Pos(3)]
	public Items Items { get; set; }        
}

[Segment("H", ',')]
public class FlatHeader

[Segment("ITEMS", ',', '|')]
public class Items

How to represent lines?

Positional lines

Positional lines are represented as segments with fields in a specified order, each annotated with [StringLength] attribute where Min and Max are equal.

The following positional line:

US David Pastrnak 100 Legends Way Boston MA 02114

can be represented as:

[Segment("")]
public class Customer
{
	[Required]
	[StringLength(10, 10, false, ' ')]
	[Pos(1)]
	public string Country { get; set; }

	[Required]
	[StringLength(20, 20, false, ' ')]
	[Pos(2)]
	public string FullName { get; set; }

	[Required]
	[StringLength(20, 20, false, ' ')]
	[Pos(3)]
	public string Street { get; set; }

	[Required]
	[StringLength(15, 15, false, ' ')]
	[Pos(4)]
	public string City { get; set; }

	[Required]
	[StringLength(3, 3, false, ' ')]
	[Pos(5)]
	public string State { get; set; }

	[Required]
	[StringLength(5, 5, false, ' ')]
	[Pos(6)]
	public string PostCode { get; set; }
}

Delimited lines

Delimited lines are represented as segments with fields (including composite fields) in a specified order, and field delimiter (and optionally a composite field delimiter):

The following line:

ITEMS,ITEM213-CC|Bauer Nexus 2N Pro|1|168.95|Hockey stick,ITEM323-CC|CCM RibCor 70K|1|500.98|Hockey skates

can be represented as:

[Serializable()]
[Segment("ITEMS", ',', '|')]
public class Items
{
    [Pos(1)]
    public string Tag { get; set; }

    [Pos(2)]
    public List<ItemDetail> ItemDetails { get; set; }
}

[Serializable()]
[Composite("ItemDetails")]
public class ItemDetail
{
    [Pos(1)]
    public string ProductCode { get; set; }

    [Pos(2)]
    public string Description { get; set; }

    [Pos(3)]
    public string Quantity { get; set; }

    [Pos(4)]
    public string UnitPrice { get; set; }

    [Pos(5)]
    public string Notes { get; set; }
}

Unexpected lines

Sometimes it is possible that certain lines can be either corrupt or not expected/not relevant and need to be excluded without breaking the parsing.

The following file has 2 lines starting with "Corrupt" which can appear at random positions in the body. We are only interested in lines that start with "B," in the body, or the header (starting with "H,") and the trailer (starting with "T,"):

H,1999-10-20
B,Alice,Smith
CorruptBodyTag,Robert,Smith
B,Robert,Smith,test
CorruptBodyTag,Jimmy,Smith
B,Robert2,Smith2,test2
T,1999-10-21

To parse this file, define the template as:

using System;
using System.Collections.Generic;
using EdiFabric.Core.Annotations.Edi;
using EdiFabric.Core.Annotations.Validation;
using EdiFabric.Core.Model.Edi;

namespace EdiFabric.Templates.Vda
{
    [Serializable()]
    [Message("Flat", "Corrupt")]
    public class FlatCorrupt : EdiMessage
    {
        [Required]
        [Pos(1)]
        public FlatHeaderCorrupt Header { get; set; }

        [Required]
        [Pos(2)]
        public AllBody Body { get; set; }       
    }

    [Serializable()]
    [Segment("H", ',')]
    public class FlatHeaderCorrupt
    {
        [Required]
        [Pos(1)]
        public string Tag { get; set; }

        [Required]
        [Pos(2)]
        public string Date { get; set; }
    }

    [Serializable()]
    [Segment("T", ',')]
    public class FlatTrailerCorrupt
    {
        [Required]
        [Pos(1)]
        public string Tag { get; set; }

        [Required]
        [Pos(2)]
        public string Date { get; set; }
    }

    [Serializable()]
    [All()]
    public class AllBody
    {
        [Pos(1)]
        public List<FlatBodyCorrupt> Body { get; set; }

        [Required]
        [Pos(2)]
        public FlatTrailerCorrupt Trailer { get; set; }

        [Pos(3)]
        public List<InvalidBody> InvalidBody { get; set; }
    }

    [Serializable()]
    [Segment("B", ',')]
    public class FlatBodyCorrupt
    {
        [Required]
        [Pos(1)]
        public string Tag { get; set; }

        [Required]
        [Pos(2)]
        public string Name { get; set; }

        [Required]
        [Pos(3)]
        public string Surname { get; set; }
        
        [Pos(4)]
        public string Extra { get; set; }
    }

    [Serializable()]
    [Segment("", '\0')]
    public class InvalidBody
    {
        [Required]
        [Pos(1)]
        public string Data { get; set; }
    }
}

To be able to weed out the unwanted lines beginning with "Corrupt", define a separate segment with a single field as:

[Segment("", '\0')]

Then include a List<> of that segment as the last segment in an [All()] object:

[All()]
public class AllBody
{
    [Pos(1)]
    public List<FlatBodyCorrupt> Body { get; set; }

    [Required]
    [Pos(2)]
    public FlatTrailerCorrupt Trailer { get; set; }

    // Last segment
    [Pos(3)]
    public List<InvalidBody> InvalidBody { get; set; }
}

Composite lines

Lines can have two levels of nesting, separated by different delimiters.

The following line begins with the tag "ITEMS". The line can contain multiple items delimited with a comma ",". Each item contains multiple sub-items, each delimited by pipe "|".

ITEMS,ITEM213-CC|Bauer Nexus 2N Pro|1|168.95|Hockey stick,ITEM323-CC|CCM RibCor 70K|1|500.98|Hockey skates

can be represented as:

[Serializable()]
[Segment("ITEMS", ',', '|')]
public class Items
{
    [Pos(1)]
    public string Tag { get; set; }

    [Pos(2)]
    public List<ItemDetail> ItemDetails { get; set; }
}

[Serializable()]
[Composite("ItemDetails")]
public class ItemDetail
{
    [Pos(1)]
    public string ProductCode { get; set; }

    [Pos(2)]
    public string Description { get; set; }

    [Pos(3)]
    public string Quantity { get; set; }

    [Pos(4)]
    public string UnitPrice { get; set; }

    [Pos(5)]
    public string Notes { get; set; }
}

Sub-items are represented with the [Composite] attribute.

[Composite("ItemDetails")]

Read VDA files

Although VDA files can be read using FlatReader, there is a legacy VdaReader which can also be used to read VDA files.

VdaReader Reference

VdaReader offers predefined settings for VDA files (reading lines of 128 symbols) and can be used without line breaks, e.g. when all lines are chained one after the other without CR/LN breaks between them.

Stream ediStream = File.OpenRead(@"\..\..\..\Files\Vda_4905_02.txt");
List<IEdiItem> ediItems;
using (var ediReader = new VdaReader(ediStream, MessageContextFactory))
{
    ediItems = ediReader.ReadToEnd().ToList();
}

Documentation

Parse Flat Files (VDA, CSV)

Parsing flat files (CSV, VDA) with Flat Reader

Flat File Delimiters

Properties of FlatReader

Configuration of FlatReader

Common Settings:

Matching a message to an EDI template

Reader exceptions and partial parsing

Flat file can't be parsed

A message within the flat file can't be parsed

Read Flat Files

Guidelines for creating EDI templates

How to identify lines?

Lines with tags

Lines without tags

How to represent lines?

Positional lines

Delimited lines

Unexpected lines

Composite lines

Read VDA files

Was this article helpful?

Comments

Documentation

Search

Parsing flat files (CSV, VDA) with Flat Reader

Flat File Delimiters

Properties of FlatReader

Configuration of FlatReader

Common Settings:

Matching a message to an EDI template

Reader exceptions and partial parsing

Flat file can't be parsed

A message within the flat file can't be parsed

Read Flat Files

Guidelines for creating EDI templates

How to identify lines?

Lines with tags

Lines without tags

How to represent lines?

Positional lines

Delimited lines

Unexpected lines

Composite lines

Read VDA files

Was this article helpful?