Sample files are a key component of OneCloud Data Prep. Sample files are used to aid in the creation of transformations in Pipelines and Mapping Groups. While a sample file isn't technically required, it is strongly advised to load a sample file to more efficiently create Pipelines.

A common misconception is that a sample file is a dataset that needs to be processed by a OneCloud Data Prep Pipeline. The data in a sample file is not actually modified by a Pipeline or Mapping Group but instead provides a visual indication of the impact of a transformation defined. In other words, data is not impacted by Data Prep until the Pipeline is executed with an actual data payload and the sample file is only used to streamline the creation of a Pipeline.

A sample file should represent the data payload that will be processed by a Pipeline. A data payload is defined as the set of columns upon which a Pipeline or Mapping Group will perform transformations. A sample file may contain some or all of the data that will be processed by the Pipeline but depends on the number of columns and rows in the actual data set. Sample files are limited to a file size of 1 megabyte (MB).

Sample File Format

Sample files must be in a delimited format and able to be viewed in a text editor such as Notepad/Wordpad, Notepad++, or Textpad. Sample files support the following delimiters:

  • Comma (,)

  • Tab

  • Pipe (|)

  • Semi-colon (;)

Sample files require a consistent data layout for all rows. A null or blank value is acceptable for any of the defined columns. Notice in the example sample file below, the PRODUCT field is blank on the fourth row.

Valid Sample File format

File Headers

A sample file must have a header row. While the header names and order in the sample file are not required to be identical to the data payload that will be processed by the Pipeline, it can be useful to align the sample file and the data payload whenever possible to avoid confusion and streamline Pipeline creation and invocation.


Did this answer your question?