📄️ Folders, Files, Data Sets, and File Headers
All files uploaded to a folder have to have the same header. The folder represents a data set, possibly composed of several files. For example, you might train with a data set of a year's worth of data. The data for that training could be in one file, containing entries across the entire year, or in many files where each file contains entries for a specific month. The entire folder will be treated as a single data set. An advantage of this approach is that you can modify your data set based on the files that you manage in the folder. For example, with a monthly file approach you can retrain with more recent data by deleting older monthly files and adding the latest monthly data.
📄️ Managing Data
Files are stored in the Azure hosted or on-premises installed ML Studio and are organized through folders. To create a folder, navigate to the "Folders" page and click "Create Folder".
📄️ Uploading File(s)
Data must be uploaded to ML Studio in order for a model to process it. To upload your data, select the folder you wish to store it in and click "Upload File(s)".
📄️ Creating a Specification
Before you can create a model, you need a folder, at least one data file, and a file specification, also referred to as a 'spec'. The file specification describes how the data in each column should be treated by a model. Since all files in a folder must have the same header, a spec is associated with a folder and applies to all files in that folder.
📄️ Folder Partitioning and Sampling
Classification and cluster model development often requires working samples or partitions of the original data set. ML Studio has features to assist in making these data subsets. These features are accessed through the Create a Partition button in the folder display. If you don't have any files in your folder you are not able to make a folder partitioning and sampling so the Create a Partition button is going to be disabled, as following:
📄️ Data Refresh: Merge, Update, Extend
Updating records is a critical component of live, integrated data systems. We provide you with the ability to update your dataset on a by-record basis using the data merging feature.
📄️ File Requirements
All uploaded files MUST meet the File Requirements