After a Pipeline has been created, a Column Definition must be specified. After creating a new Pipeline, the Define Columns screen opens automatically. There are three ways that a Column Definition can be created for the new Pipeline:
Utilize an existing column definition
Create from a file
Utilize an Existing Definition
The most efficient way to define a Pipeline Column is to use a sample file. When uploading a sample file to Data Prep, a Column Definition is created. If that sample file will be used to aid in the creation on the Pipeline, then the Column Definition of the sample file can be used to create the Column Definition of the Pipeline. This method should be used whenever possible.
📓 Sample files are limited to a file size of 1 megabyte (MB).
To create a Column Definition for a Pipeline from a sample file loaded to Data Prep:
Click the Pick From List button on the Define Columns form.
Select a sample file that has been uploaded to Data Prep from the Files dropdown.
A preview window will display the sample file column definition and show a warning that the columns in the Pipeline will be replaced by the sample file columns. Click OK.
Review the Column Definition and, if needed, make updates for Column Names.
💡 The name of the column specified in the Column Definition is not required to match the name of the column/field in the data payload that will be processed by the Pipeline. Additionally, the column name in the Pipeline Column Definition can be modified from the sample file Column Definition even if the sample file was used to create the Column Definition of the Pipeline.
Create from File
A Pipeline Column Definition can also be created from a file saved locally or on a network drive. The file must be delimited and contain column headers.
Click the "Create From File" button on the Define Columns form.
Browse and select the file to use to populate the Pipeline Column Definition.
A preview window will display to show the list of columns detected in the file.
The Column Definition will be populated based on the file selected.
Review the Column Definition for accuracy.
The Column Name is populated from the file header record and the data type is populated based on the data detected in the rows.
⚠️ The data type may need to be updated.
Once the column definition is confirmed, save the changes.
To manually add columns to the Column Definition,
Click the blue plus sign (+) in the upper right corner of the Define Columns form.
Specify the data type of the column.
Specify the name of the column.
Repeat this action for each of the columns that will need to be part of the Pipeline.
Modifying a Pipeline Column Definition
The Column Definition of a Pipeline can be modified at any time to update the Data Type or Name of a column. To modify the Column Definition:
Select the Columns tab of the Columns and File Pane
Click "Edit Columns".
Modify the Column definition as needed and then click Save.