A visual approach to creating multivariate geovisualization test data
Keywords: synthetic data generation, multivariate, spatial data, glyph visualization, usability testing
Abstract. Testing multivariate geovisualizations, for example glyph designs, for their perceptual qualities requires suitable test data. Synthetic data can be useful for evaluating different data characteristics and their perceptibility through different designs. Real data may not contain all relevant data characteristics, for example with regard to spatial distributions or trends, which may be interesting for testing. Additionally, real data may contain a lot of noise or randomness. Even though, real data are imperative for testing design performance in realistic situations. Fuchs et al. (2017) did a systematic review on 64 glyph evaluation studies, whereof more than 60% used synthetic data. They note that for a better understanding of glyph designs more studies should evaluate glyphs with synthetic and real data. Besides visualization evaluation, a number of application areas, e.g. machine learning or software testing, generate and use synthetic data. Thus, many methods and tools for generating synthetic data exist. Typically, a mathematical function or statistical distribution is defined and random or ordered values, optionally overlaid with noise, are drawn from the defined models to build the synthetic data. However, we found it difficult to create, especially multivariate, spatial distributions of data that follow specific rules and display interactions between the data dimensions with existing tools. Thus, we designed a process that allows the intuitive ‘drawing’ of spatial data distributions and subsequently the derivation of multivariate data from several overlaid layers of ‘drawings’.