Settings for a Flat File (TXT, CSV)

This topic describes how to configure a new Flat File connection in DataSync.

To configure the connection:

  1. In DataSync, create a new source connection for a flat file.

  2. Fill out the fields as described below:

  3. Click Save.

Connection Properties Description
URI

The URI field allows you to specify a source file for your new connection. Both local and network file paths can be entered into the URI field. Also, source files may either be a single CSV file or a folder that contains multiple CSV files.

Include Types Select the file types to include: CSV, TXT, or both.
Delimiter Select the format to be used when parsing the files: Comma, Type, or Custom.
Quote parameter Specify the character to be used to quote values.
Top Rows to Skip If required, enter the number of rows to skip, starting from the top.
Exclude file extension If enabled, this property will remove the file extension from the destination table in the extraction.
First row contains headers Select this option if the first row contains headers.
Include subdirectories Select this option to read files and schema.ini in nested folders.
Use row numbers Select this option if you are updating or deleting CSV files. This option will create a new column called RowNumber, which will be used as key for the table.
Enable Pooling

This parameter enables connection pooling. When it is selected, the three following properties are displayed:

  • Pool idle timeout: The allowed idle time for a connection before it is closed.

  • Max pool size: The maximum number of connections in the pool.

  • Pool wait time: The maximum number of seconds to wait for an available connection.

 

The Additional Connection Properties panel allows to add parameters that are not specified in the Connection Properties panel. Additional properties can be set to Visible or Encrypted. The most common additional properties are listed below.

Additional Connection Properties Description
Charset

If your source file is encoded in ANSI format, you must add the Charset parameter to your configuration in the Additional Connection Properties pane, exactly as it appears in the image below (including the value—windows-1252):

If your source file is encoded in UTF-8 format, no additional parameter is needed.

Culture

By default, DataSync uses the decimal separator from your computer's Windows regional format settings.

For Flat File connections, source and destination files must use a decimal separator that is compatible with your DataSync configuration.

If the source or destination file uses a different decimal separator (e.g., a flat file that uses commas as the decimal separator but your regional settings are set to American English), you must add the Culture additional property to your configuration as follows:

  • Commas: fr-FR (or other regional value that uses commas)

  • Periods: en-US (or other regional value that uses periods)

Row Scan Depth

The Row Scan Depth additional property lets you define how many rows of your source file will be scanned to determine the data type of your table's columns.

The default scan is set to 100 rows. If you wish to alter that value, simply add a Row Scan Depth additional property and enter the desired value (e.g., 5000).

Note  

If you add a Row Scan Depth additional property with a value of 0 (zero), DataSync will scan the entirety of the flat file, regardless of its length. Scanning a larger amount of rows will result in longer processing times, but they will ultimately be more accurate.

AggregateFiles

This parameter aggregates all of the files located in the URI directory into a single table called AggregatedFiles.

By default, the first file in the folder is used to define the schema; however, the MetadataDiscoveryURI parameter can be specified to use a different file instead.

IncludeFiles

This parameter allows you to provide a comma-separated list of file extensions to include into the set of the files modeled as tables. The default value is "CSV,TXT,TAB". For example, IncludeFiles=TXT,TAB.

You can include files without an extension by specifying the NOEXT value.

You can also include archive files of these types: ZIP, TAR, and GZ. When archive files are found, they will be downloaded to the local machine so the driver can extract and parse the contained files. Files contained within an archive must match an extension listed in IncludeFiles to be included in the set of files modeled as tables.

File masks can be specified using an asterisk (*) to provide enhanced filtering capabilities; for example: IncludeFiles=2020*.csv,TXT.

Files specified in Schema.ini are included in addition to the files specified by this property.

It is also possible to specify date and time filters using CreatedDate and ModifiedDate. All extension filters are evaluated in disjunction (using OR operator), and then the resulting filter is evaluated in conjunction (using AND operator) with the date and time filters.

Examples:

IncludeFiles="TXT,CreatedDate<='2020-11-26T07:39:34-05:00'"

IncludeFiles="TXT,ModifiedDate<=DATETIMEFROMPARTS(2020, 11, 26, 7, 40, 50, 000)"

IncludeFiles="ModifiedDate>=DATETIMEFROMPARTS(2020, 11, 26, 7, 40, 49, 000),ModifiedDate<=CURRENT_TIMESTAMP()"

ExcludeFiles

This parameter provides a comma-separated list of file extensions to exclude from the set of the files modeled as tables.

It is also possible to specify date and time filters using CreatedDate and ModifiedDate. All extension filters are evaluated in disjunction (using OR operator), and then the resulting filter is evaluated in conjunction (using AND operator) with the date and time filters.

Examples:

ExcludeFiles="TXT,CreatedDate<='2020-11-26T07:39:34-05:00'"

ExcludeFiles="TXT,ModifiedDate<=DATETIMEFROMPARTS(2020, 11, 26, 7, 40, 50, 000)"

ExcludeFiles="ModifiedDate>=DATETIMEFROMPARTS(2020, 11, 26, 7, 40, 49, 000),ModifiedDate<=CURRENT_TIMESTAMP()"

InsertMode

This property is for Destination only. It specifies the mode for inserting data into CSV files.

  • FilePerBatch (mandatory for SEI Cloud): A new CSV file will be created for every batch. The file name is formatted as yyyyMMddhhmmssSSS_batchId, where the date and time represent the beginning of the transaction.

  • SingleFile (Default): In this mode everything will be inserted in a single CSV file.

Note:  Note that SingleFile can only be done for local and network path.

CreateBatchFolder Whether to create a folder or not when InsertMode is set to FilePerBatch. Default Value is true.
BatchNamingConvention

Determines the naming convention of batch files.

  • Timestamp_BatchNumber: The file name is formatted as yyyyMMddhhmmssSSS_batchNumber, where the time stamp marks the beginning of the transaction.

  • TableName_BatchNumber (default): The file name is formatted as TableName_batchNumber.csv.