Skip to main content

Formats of the National Land Survey of Finland’s geospatial datasets

LAZ

LAZ is a losslessly compressed form of the LAS format used for the distribution of laser scanning data.  LAS is an open file format developed by the American Society for Photogrammetry and Remote Sensing (ASPRS) that has also been adopted as an OGC community standard (2018).


LAZ compression compresses a LAS file by 7–20% compared to the original.  The LAZ method is open source and was first used in the LAStools library.

The LAS format can be used with most software products that use laser scanning data.  Not all of them are capable of directly reading the compressed LAZ format, but there are many open source and non-open source programs for decompressing LAZ, such as LASzip, LAZperf and FME.


GML

GML is an XML format developed for the distribution of geospatial data. GML was developed by the international standardisation organisations ISO and OGC.


The National Land Survey of Finland uses GML for the distribution of open data from the cadastral register and the Topographic Database, for example. The INSPIRE download services (WFS services) are also based on the GML format.


The GML format can be used to distribute complex geospatial data structures. For example, the same object can have several different geometries: the municipal administrative centre as a point, the municipal borders as lines and the municipal territory as a polygon. GML supports geometries that are either partly or entirely arcs such as circles, as well as enumerated attributes such as place names, of which there can be versions in Finnish, Swedish and the different Sámi languages.

 
The weakness of GML is that it can be difficult to use, especially when the GML file contains complex structures. Off-the-shelf software may not support all of the features of GML, and developers also find the file format cumbersome. In addition, text format GML is first and foremost a transfer format that is not directly suitable for any high-performance purposes; instead, the data needs to be converted to another file format before use.


Despite its weaknesses, GML is the file format that allows the most secure transfer of the National Land Survey’s vector datasets in their original and unchanged format. The differences compared to the GeoPackage format are minor, though.


QGIS is able to open GML files from the Topographic Database and the National Land Survey’s nomenclature without any problems.


GeoPackage


GeoPackage (GPKG) is an OGC standard adopted in 2014 that defines a uniform way to save raster and vector data in an SQLite database. A GeoPackage database is a single file that can contain an unlimited number of separate vector and raster map layers. For the user, this means fewer files to download.  This is particularly noticeable in the case of the National Land Survey’s Topographic Database, which contains more than a hundred feature classes. GeoPackage files have no size limit and there is therefore no need to split the data into parts.


GeoPackage is a versatile file format that supports almost all the features used in the GML products of the National Land Survey of Finland. As GeoPackage always uses UTF-8 character encoding, it has no problems with special characters such as those used in the Sámi languages. In addition, GeoPackage is fast, so there is no need to convert it to other formats: it is ready to use as such. Furthermore, GeoPackage is scalable as a database, and it is possible to store vector data presentation styles in the same database as the datasets. As the GeoPackage standard is still unfinished in this respect, the NLS tests the saving of styles in a manner supported by QGIS.


Application software support for GeoPackage files is better than for GML, but worse than for Shapefile. The most common geospatial data software products, such as QGIS, ArcGIS and MapInfo, are able to use GeoPackage files directly.


ESRI Shapefile

Shapefile is a file format introduced by ESRI in 1994. Shapefile became widely used in non-ESRI software after ESRI published the main features of the format’s technical structure in 1998. Shapefile is probably the most widely supported vector format in geospatial data software today, and it is also widely used for the distribution of data. Shapefile’s main advantage is its extensive software support and its speed, which make it suitable for both transfer and use.


We have become intimately familiar with the features of Shapefile and therefore know how to deal with its shortcomings, but the file format is becoming outdated. The worst of Shapefile’s shortcomings is that it cannot transfer all types of data. For example, the format does not support DATETIME but only the date, strings cannot be longer than 254 characters and fields cannot contain NULL values. Other problems include the maximum file size of 2 gigabytes, which means that large files must be split into parts. The need for splitting is further increased by the fact that a Shapefile may only contain one map layer and the geometries in one Shapefile must be of the same type, i.e. either points, lines or areas. Difficulties with special characters are also common, as there is no single commonly used method for Shapefile character encoding.


The National Land Survey of Finland distributes different types of data in Shapefile format. If there is also a GML or GeoPackage version of the same data, the Shapefile is their parallel product with a different data content due to the technical limitations of the format.


Shapefiles can be processed with all the most common geospatial data software, including ESRI products, MapInfo Pro, QGIS and Tatuk GIS.

 

MapInfo MID/MIF

Like ESRI Shapefile, the MID/MIF file format developed by MapInfo dates back to the mid-1990s. MID/MIF is a text transfer format, equivalent to the binary TAB format. TAB is only used in MapInfo, but MID/MIF has also been a common transfer format due to its simplicity. Today, GeoJSON has mostly replaced MID/MIF in this respect.


The limitations of MapInfo MID/MIF are similar to Shapefile, such as a size limit of 2 gigabytes, maximum string length of 254 characters and difficulties in using special characters. MapInfo has also introduced a new “NativeX Extented Tab” file format which does not have these limitations, but this file format is not supported by the National Land Survey of Finland.


MapInfo MID/MIF is no longer of much relevance as a general transfer file format, and MapInfo users no longer require MID/MIF either, as MapInfo can use both GeoPackage and Shapefile.


In addition to MapInfo, MID/MIF files can be opened with QGIS.


GeoTIFF

GeoTIFF is a plain TIFF image file with metadata required for the use of geospatial data saved inside it.  The most important piece of metadata is georeferencing, i.e. information about the geographical location of the image and the coordinate system used by the image. Before the development of GeoTIFF, the location of an image could be indicated in a separate “world file” and the coordinate system in a separate projection file.  Georeferencing data stored within an image file has the important advantage that it cannot be lost, but GeoTIFF can contain more metadata. The first GeoTIFF definition was prepared in 1994, and GeoTIFF became an OGC standard in 2019.  GeoTIFF is probably the world’s most widely used file format for orthophotos and satellite images.


TIFF is a versatile file format.  It is often thought that all TIFF files are uncompressed and therefore large, but a TIFF can also be compressed. The compression can be achieved by using non-destructive compression methods, such as LZW, or more efficient methods that will permanently change the data, such as JPEG. In the past, it was common for image viewers to be unable to open some TIFF formats, but these problems are now less common.


The National Land Survey of Finland uses GeoTIFF for the distribution of specific raster map data and elevation models. The raster map TIFF files are 8-bit, LZW-compressed, single-channel images with an associated colour palette.  The elevation model files are 16-bit single-channel grayscale images.


TIFF and GeoTIFF images can be viewed with both standard image viewers and geospatial data software. However, dedicated software is required to use the 16-bit elevation model TIFF files.


JPEG 2000

Dating back to 1992, JPEG 2000 is an advanced image compression method designed to replace JPEG. JPEG 2000 allows both lossless and lossy compression, and the quality of the compressed image relative to the file size is better than with JPEG.


JPEG 2000 is a highly complex and sophisticated method. The best JPEG 2000 program libraries are expensive and the few open source solutions there are have been slow until recently. Since other compression methods and free software have achieved faster compression/decompression and good enough quality with slight file size compromises, JPEG 2000 has not become widespread in geospatial data use, and GeoTIFF is dominant.  JPEG 2000 has been adopted in other contexts, however, such as digital cinemas and medical imaging.


Geospatial metadata can be saved in a JPEG 2000 file in the same way as in a GeoTIFF file. The method is described in the OGC standard “GML in JPEG 2000”.


The orthophotos available as open data from the National Land Survey of Finland are losslessly compressed JPEG 2000 files according to the GML in JPEG 2000 standard. The main reason for choosing this file format is that JPEG 2000 is one of the file formats accepted in the INSPIRE Directive’s technical guidelines.


The large JPEG 2000 orthophotos of the National Land Survey of Finland cannot be opened with standard photo editing software, but they can be opened with software by QGIS and ESRI, and the GIMP photo editor.


PNG

PNG is a very common image file format, especially on the internet. The image data in a PNG file has been compressed using a non-destructive method that is particularly effective at compressing images with many uniform areas of the same colour. Traditional maps are just that. PNG is fine for viewing images, as long as the images are relatively small, but opening large PNG images requires a lot of memory and can be slow.


PNG images can be viewed with both standard image viewers and geospatial data software. A separate .pgw file associated with the map file is used by geospatial data software to identify the location of the image, but not its coordinate system.


JPEG

Like PNG, JPEG is a very common image format. JPEG compression is lossy, i.e. it permanently changes the image data. JPEG compression was developed for compressing photographs, and is therefore best suited for compressing constantly and gradually changing images such as aerial photos. The method exploits the weaknesses of the human eye and removes information that the eye would not discern from images. This allows for an improvement of the compression ratio, but the final result is not optimal for machine image interpretation. In addition, JPEG compression compresses standard maps into a smaller space than PNG compression, but the quality of the maps may be visibly deteriorated, especially in terms of lines and text.


Like PNG, JPEG files are best suited for the viewing of fairly small images. To decompress a JPEG, the entire file must first be opened, which means that opening large images requires a lot of memory.


JPEG images can be viewed with both standard image viewers and geospatial data software. A separate .jgw file associated with the map file is used by geospatial data software to identify the location of the image, but not its coordinate system.


ARC/INFO ASCII Grid

An ASCII Grid is a raster text file. The header lines of the file specify the size of the raster image in lines and columns, followed by the numerical values of each pixel in the image, one after the other. The format only supports single-channel rasters. In a typical use case, the raster values represent a measurement result. In an elevation model raster, the measurement result is the elevation of the pixel’s centre point in the field.


ASCII Grid is a simple file format that can be viewed with a regular text editor, if necessary. Applications do not require extensive software libraries to use it, as is the case with GeoTIFF, for example. However, GeoTIFF and the other binary file formats work faster than ASCII Grid in applications.


ASCII Grid is supported by, for example, ESRI’s products and QGIS.


CSV

Comma-Separated Values (CSV) is a text file where the data fields are separated by a delimiter. The delimiter is often a comma, as the name of the file format suggests, but the CSV format is non-standardised, and other characters such as a semicolon and a colon are also commonly used as the delimiter.


The National Land Survey of Finland offers control point register data in the CSV format. A semicolon is used as the delimiter.


CSV files are most commonly processed with regular word processors or Excel, but they can also be read with geospatial data software.


CityGML

CityGML is a data model and file format for three-dimensional objects adopted by the OGC. The National Land Survey of Finland uses CityGML for 3D buildings.
Special 3D software is required for CityGML files.