Using curios.IT

You can look at your data by using one of the 18 available data viewers. Each viewer is designed for a specific purpose (e.g. finding outliers, comparing groups etc.). You select the currently active viewer by choosing from the main menu which is displayed if you click on the eye.

Subset history

In most viewers you can select a data subset by clicking on some visual features in the chart. You will then see the average field values for the selected records on the right side (together with a selection of record labels below). If you want to make the selected records the current data set (drill down), you can click on the blue 'Selected' button on the left. If you want to make all BUT the selected records the current data set, you can click the 'NOT Selected' button. Each time you select a subset as the new current set, a button which will bring you back to the previous set will be added on the left (subset history).

Subset creation options

You can also create a subset of the records directly (buttons on the left side of the screen):

  • 'by Value' allows you to create a subset of records by specifying a selection rule (e.g. 'Performance'>5)
  • 'by Category' allows you to create a subset of records by selecting one or more categories (you have to select the category column first).
  • 'Typical' creates a subset from a fraction of the records which is close to the average of all records. The outliers which are far away from the average are removed.
  • 'Outliers' creates a subset from a fraction of records which are far from the average (outliers).
  • 'Add Category' adds a category column by applying one of the grouping methods. Each records is labelled with its corresponding group name (e.g. "Cluster 5")
  • 'Select columns' allows you to reduce the number of columns in the data set. This makes sense if you want to restrict the effect of algorithms (like clustering) to certain columns

Again you can jump back to a parent data set by clicking on a button in the subset history.

Grouping options

In most viewers data can be grouped in various ways:

  • ALL: no grouping, all records are put into a single group.
  • CLUST-SOM: SOM (self organizing maps) centroid based clustering. The number of clusters is chosen depending on the number of records but can be changed (dimensions of SOM grid). Clusters (or better SOM grid points) can be empty sometimes.
  • CLUST-KMEANS: k-Means centroid based clustering. The number of clusters can be specified.
  • CLUST-DBSCAN: DBSCAN density based clustering. Parameters can be specified.
  • CATEGORIES: group by a category column: all records with the same value for this column are put into the same group.
  • INTERVALS: group by the values of a numeric field. The value ranges for the groups are chosen in such a way that the groups contain approximately the same number of records.
  • HISTOGRAM: group by the values of a numeric field. The value ranges of the groups are equidistant (the full value range min-max is divided into 10 intervals).

Category colors

Each category in the main data set is assigned a color automatically. If there are many categories, the colors might be close together and therefore difficult to distinguish. If you have only a few categories in the currently active subset, you might want to let curios.IT define a new color table for the subset (with more separated colors). To do so, choose 'Category color set from' -> 'Subset'