Time Series Data

Slycat™ accepts two different time series data formats, which we will call Xyce and CSV. Each input format consists of two parts, a table file describing the entire ensemble, and a time series data file for each simulation in the ensemble. Like the CCA and Parameter Space models, the table is at the heart of the model. For each simulation (for each row in the data table), there must be a file with time series data. Within each of these time series files are sequences of values, sampling one or more output variables over the course of the simulation. It is not necessary that each simulation write the same number of samples into their time series files, but it is required that each simulation have a corresponding data file with matching output variables that cover the same time range.

Xyce Format File Structure

The Xyce format consists of Xyce-generated time series files stored within a fixed directory hierarchy. The hierarchy is rooted within a single high-level directory where there must be a dakota_tabular.dat file (providing the data table). It is not that the file must be named dakota_tabular.dat, but rather the file format must correspond to the dakota_tabular.dat files generated by Dakota. Additionally, a set of subdirectories (one per run) must be located in the same directory as the dakota_tabular file. These subdirectories should all be named using a template like workdir.n, where n is the simulation number. Within each subdirectory, there must be a time series file generated by Xyce that is formatted as a .prn file. The time series files must all be named identically (the subdirectory defines which simulation generated them), and each file must contain a shared set of time series variables (columns with matching headers within each of the .prn files).

CSV Format File Structure

The CSV format (such as heartbeat.dat files produced by Sierra, or .csv outputs from Catalyst), is less structured than the Xyce format. The individual time series files need not be stored in the same directory hierarchy as the data table, nor does the directory structure need to follow any structure or naming conventions. Instead, the data table is a CSV file, which contains a column of URIs providing full paths to each of the time series files, which must also be CSV files (no .prn files). Each URI must have the format: file://machine/absolute_directory_path/timeseries_filename.csv.

Time Series Files

Whether we are using .prn files or CSV files, both formats are essentially tables in which each column is a separate variable and each row is a set of concurrent samples for each of the variable columns. The first line of a time series file contains headers, which provide the names of the time series output variables. Note that in a CSV file, we expect to see only a single row of header information consisting of the column names (some physics codes output two rows of header information, with the variable names in the first row and the units in the second row – this is not a legal CSV format). At least one column must be a time value (typically the first column).

If your data is not currently in one of these two formats, Excel can be used to create CSV files from most common table formats. Note that if output metrics have been created separately in a post-processing step, they will need to be integrated with the inputs to form a single file prior to model creation.

HDF5 Intermediary Format

In the time series creation wizard, both formats are rewritten as HDF5 files in a temporary Slycat™ directory (we have found that this significantly speeds up our processing compared to working with the originally-formatted files). If you opt to keep these HDF5 files, they constitute a third data format that the wizard will accept, though be aware that HDF5 files created through other means are not interchangeable since their internal structures will be different.