Changes in the Dataset Structure in DBP 2.0
In version 2 of the platform the dataset concept changes in the way that a dataset contains data files for more than one year. Previously we defined for every timetable year a new dataset and took the year in the name of the dataset, like for example “Haltestellen (SHAPE) 2023“. This dataset could contain only data for the year 2023. For the previous year we had “Haltestellen (SHAPE) 2022“ - again only with the data from that year.
Now we define a single dataset, called “Haltestellen (SHAPE)“, which contains one data file for every year. In the UI you can see the data files in a table under the general description of the dataset, together with the year for which the data file contains data, and the last date when this data was updated.
Normally, only the current year will be updated, the data for previous years will be kept for historical reasons (like it is now).
Because of this change in the dataset structure, there are corresponding changes in the download API, which must be taken into consideration if you want to download data programatically - see below.
Prerequisites
You have to be aware of the following when you plan to use scripts/programms to download data sets from data.mobilitaetsverbuende.at:
Data sets can be only downloaded by registered users
The authentication follows the OpenID Connect specification (implemented by a Keycloak server)
A user must be registered manually (cannot register per script)
Data sets can only be downloaded after the user agreed to the license
Accepting the license must be done manually
The preparation steps would then be:
Register to data.mobilitaetsverbuende.at and confirm the email address you gave by the registration
Take a look at the data sets page, click on the data set you want to download, show the license and accept it
Write the credentials (user and password) you used in the registration to some config file for your script
Use the credentials in the script to authenticate to Keycloak and obtain an access token
Use the access token to retrieve data sets and download data set files from the plattform
OpenID Connect Parameters - Changed in DBP 2.0
Use the following information to obtain an access token on behalf of your registered user:
Access Token Endpoint: https://user.mobilitaetsverbuende.at/auth/realms/dbp-public/protocol/openid-connect/token
Changed: Client ID: dbp-public-ui
Grant Type: password
Credentials (username & password): the ones you used for the registration
Changes in DBP Public API
Because the link to the DBP Public Swagger UI will show the old API documentation (for version 1), here the most important changes you will need the implement a file download:
to get the list of datasets from the endpoint /api/public/v1/data-sets you must pass the HTTP parameter tagFilterModeInclusive and set it to true (see below example, function get_dataset_list)
the result of the dataset list is a json list like this, where a dataset contains more active versions (one per year), everyone with the latest creation date:
[ { "id": "27", "name": "OJP Übergabepunkte", "active": true, "license": { "id": "1", "name": "MVO Datenlizenz alt" }, "descriptionDe": "Dieser Datensatz enthält die aus/für Österreich relevanten Übergabepunkte für OJP Services. Das Österreichische OJP Service befindet sich derzeit im Aufbau durch die Verkehrsauskunft Österreich (VAO), weitere Informationen sind unter www.verkehrsauskunft.at und www.alpine-space.eu/linkingalps zu finden", "descriptionEn": "This data set contains the exchange points relevant from / for Austria for OJP Services. The Austrian OJP Service is currently under development by Verkehrsauskunft Österreich (VAO), further information can be found at www.verkehrsauskunft.at and www.alpine-space.eu/linkingalps ", "documentationUrlDe": "https://arge-oevv.atlassian.net/wiki/spaces/GEO/pages/185892865/OJP+bergabepunkte", "documentationUrlEn": "https://arge-oevv.atlassian.net/wiki/spaces/GEO/pages/185892865/OJP+bergabepunkte", "termsOfUseUrlDe": "https://arge-oevv.atlassian.net/wiki/spaces/DBP/pages/85524481/Nutzungsbedingungen", "termsOfUseUrlEn": "https://arge-oevv.atlassian.net/wiki/spaces/DBP/pages/185237505/Terms+of+Use", "tags": [ { "id": "4", "valueDe": "CSV", "valueEn": "CSV", "numberOfDataSets": 0 }, { "id": "30", "valueDe": "OJP", "valueEn": "OJP", "numberOfDataSets": 0 }, { "id": "25", "valueDe": "Haltestellen", "valueEn": "Stops", "numberOfDataSets": 0 } ], "activeVersions": [ { "id": "93", "active": true, "year": "2023", "dataSetVersion": { "id": "2418", "created": "2022-08-03T21:05:04.385Z", "file": { "originalName": "exchangepoints.zip", "size": "53339" }, "deleted": false } } ], "latestVersions": [ { "id": "93", "active": true, "year": "2023", "dataSetVersion": { "id": "2418", "created": "2022-08-03T21:05:04.385Z", "file": { "originalName": "exchangepoints.zip", "size": "53339" }, "deleted": false } } ] } ]
to download the data of a dataset version you must give the year of the version, which means, you need to use an endpoint like /api/public/v1/data-sets/27/2023/file - where here 27 is the dataset id and 2023 is the year for which we want to download the latest data.
Example Script - Changed in DBP 2.0
This example script demonstrates how to get the access token from Keycloak and how to use the access token to automatically download a data set file. The OpenID Connect standard is well described on the internet. For all supported operations of the REST API interface of the DBP and the structure of the returned information it is strongly recommended that you consult DBP Public Swagger UI. (Important: Unfortunately this link will still show the old documentation until November 22nd, 2023.)
Changes for DBP 2.0:
Client Id has changed: you must now use dbp-public-ui
Function get_access_token has changed: a new parameter (-d “scope=openid”) was added
Function get_dataset_list has changed: there is an HTTP parameter added after the URL: tagFilterModeInclusive, which must be set to true
Function download_dataset has changed: the endpoint must contain the year for the dataset version that you want to download (which in this example we get as the 3rd parameter of the function)
#!/bin/bash # vim: ts=2 sw=2 et # # This example demonstrates the download of a dataset from Mobilitaetsverbuende Oesterreich OOG DBP per script. # The user credentials must belong to an active user who registered to the DBP # (data.mobilitaetsverbuende.at) and already accepted the corresponding licences. # # Some constants: KEYCLOAK="https://user.mobilitaetsverbuende.at" REALM="dbp-public" # Typically you would store the credentials in a configuration file MY_USERNAME="me@example.com" MY_PASSWORD="abcdefghijkl" # DBP access DBP_BASE=https://data.mobilitaetsverbuende.at CLIENT_ID="dbp-public-ui" ENDPOINT_DATA_SETS=/api/public/v1/data-sets echo Auth on $KEYCLOAK, DBP on $DBP_BASE # Get the access token function get_access_token() { curl -sS -k -X POST \ -d "client_id=${CLIENT_ID}" \ -d "username=${MY_USERNAME}" \ -d "password=${MY_PASSWORD}" \ -d "grant_type=password" \ -d "scope=openid" \ "${KEYCLOAK}/auth/realms/${REALM}/protocol/openid-connect/token" | jq -r '.access_token' } # List the existing datasets # Needs access token function get_dataset_list() { local token=$1 curl -sS -k ${DBP_BASE}${ENDPOINT_DATA_SETS}?tagFilterModeInclusive=true \ -H "Accept: application/json" \ -H "Authorization: Bearer $token" } # Take the given dataset # Needs access token and dataset id function get_dataset() { local token=$1 local id=$2 curl -sS -k ${DBP_BASE}${ENDPOINT_DATA_SETS}/${id} \ -H "Accept: application/json" \ -H "Authorization: Bearer $token" } # Download the given dataset # Needs access token and dataset id function download_dataset() { local token=$1 local id=$2 local year=$3 curl -sS -k ${DBP_BASE}${ENDPOINT_DATA_SETS}/${id}/${year}/file \ -H "Accept: application/zip" \ -H "Authorization: Bearer $token" } # Get the acces token echo Take access token token=$(get_access_token) echo $token # Let's say we are interested in only 1 Dataset with the name "Liniennetz (JSON)", and from this datase # we want to download the data for the year 2023 # We list all datasets and for the wanted one we get the id # Note that here we make all these transformations with jq, but in another programming language this can be done with # language specific means WANTED='Liniennetz (JSON)' YEAR='2023' echo Take dataset id dsid=$(get_dataset_list $token | jq -r "map({ name, id } | select( .name == \"$WANTED\") | .id)[]") # Now we can get further information about the wanted dataset, for example the active dataset versions, # which now is an array with one version per year, and we could further check if we already have that file # (by name or by creation date/time) # Here we just display that information and then proceed to download echo Latest versions of Dataset $dsid get_dataset $token $dsid | jq '.activeVersions' # Now we download the file echo Download the file ZIPFILE=my-download-$(date +%Y%m%d%H%M%S).zip download_dataset $token $dsid $YEAR > $ZIPFILE echo Everything done, see $ZIPFILE