Executive summary
Data about processes in our cities tend to be difficult to find. What information is available and where it is stored are often obscure, which present as barriers to access, particularly for researchers tackling issues in cities. This City Research Insight reports on the Canadian Urban Data Catalogue (CUDC), a catalogue that facilitates access to urban data¹. It is housed at the Urban Data Research Centre (UDRC), which addresses major issues around urban data, particularly around issues of access and information. An affiliated initiative of the School of Cities, UDRC utilizes the potential of data to focus on solutions to urban challenges.
About the Centre
The Urban Data Research Centre (UDRC) leverages the potential of urban data to enhance the design, planning, and operations of cities, and improve the delivery and impact of programs and services for its residents. UDRC’s mission is to break down traditional data ‘silos’ within cities and give them greater interoperability and control over their digital infrastructure. Our activities include ontologies for city data, city data standards, City Digital Twins, and City Data Governance.

Canadian Urban Data Catalogue
With the availability of open data platforms such as CKAN and Dataverse, there are a growing number of repositories – operated by governments, NGOs, and for-profit organizations – that contain urban data. Although the digital landscape is inundated with data, it can be a difficult process for researchers to find specific information suitable and relevant for their study. This paradoxical scarcity in an age of abundance is due to inadequate metadata, a lack of audience-tailored data presentation, and challenges in accessibility depending on where data is stored.² The single greatest barrier to finding relevant datasets is that they are distributed across many different data platforms that are variably open and closed, making them exceedingly difficult to find. In addition, inconsistencies in the accompanying metadata often arise, impeding the identification and comparison of datasets. Furthermore, the domain-specific nature of metadata necessitates a familiarity with specialized terminology, adding another layer of complexity in cross-portal searches.
The Canadian Urban Data Catalogue (CUDC) facilitates the discovery of relevant Canadian urban data by providing a catalogue of datasets regardless of where they are stored. CUDC documents both open and closed datasets, plus data web services, and currently contains over 40,000 catalogued urban datasets. It is an open catalogue that is available to anyone to search, add new entries into the catalogue, and provide reviews of datasets. CUDC is not meant to be a repository, though it has that capability. Instead, it is designed to be a catalogue – a directory for discovering datasets relevant to one’s needs.

CUDC Team
- Professor Mark Fox, Director, Urban Data Research Centre, School of Cities
- Dr. Bart Gajderowicz, Research Associate and Executive Director, Urban Data Research Centre, School of Cities
- Dishu Lyu, Research Assistant, Urban Data Research Centre, School of Cities
Plus contributions by numerous urban data curators
Metadata
To address the multitude of possible entities and attributes that can be used to catalogue an urban dataset, UDRC’s objective is to establish a dataset metadata model with the goal of identifying and sequencing the metadata that a dataset cataloguer should provide.
Here, “metadata” refers to the entities and attributes that:
- enable the discovery of a dataset,
- help determine its suitability or relevance, and
- clarify who may use it and how.
Toward this goal, such a metadata model should comprise a series of levels, beginning with a base case (i.e., “no capability”) and progressing through increasingly advanced stages of cataloguing capability. This approach parallels the Capability Maturity Model for Software,³ which guides software organizations in managing development and maintenance processes while moving toward software engineering and management excellence. Each successive step in the maturity “ladder” corresponds to a more refined process capability. In the context of dataset metadata, the lowest level is characterized by the absence of any metadata specification capability. From there, the question becomes how to define the next and subsequent maturity levels so that the effort needed to catalogue a dataset is balanced with providing enough metadata for discovery, assessing suitability or relevance, and clarifying who may use it and how.
CKAN and Dataverse
CKAN, the Comprehensive Knowledge Archive Network is a data management system that is open-source and can be used to store and distribute open data. It is popularly used by public institutions and not-for-profits for storing and sharing data freely and openly.
Similarly, Dataverse is an open-source web application that is used to share, preserve, cite, explore, and analyze research data. The application makes access to data easily available to others who seek to replicate similar work, while preserving academic credit.
Dataset Metadata Capability Maturity Model (DMCMM)
At the core of CUDC is the dataset metadata capability maturity model (DMCMM). The DMCMM provides a framework and vocabulary for representing a dataset’s metadata. The higher the level of maturity, the more complete the catalogue entry’s metadata. Based on Fox et al. (2024), the maturity model partitions and stratifies a dataset’s metadata attributes, with the lowest level of maturity focusing on attributes that facilitate searching by topic, and spatial and temporal aspects of datasets. Other levels focus on licensing, governance, adherence to FAIR principles, Indigenous data principles, etc. The Digital Governance Council has accepted the metadata model as a standard.⁴ It is also available as an open-source CKAN plugin extension.
Level 1.
Focus on general description of the dataset supplemented with temporal and geospatial information
Level 2.
Focus on the content of the dataset, authorship and ownership
Level 3.
Focus on additional content and versioning information, incorporates some FAIR principles, and expands on the temporal and geospatial resolution of the data
Level 4.
Focus on privacy and identifying data of individuals captured by the dataset, including guidelines for Indigenous communities
Level 5.
Focus on FAIR principles: Findable, Accessible, Interoperable, Reusable
Level 6.
Focus on the statistics and quality of the data in the dataset
Dataset Metadata Maturity Levels (adapted from Fox, et al., 2024)
Inside the CUDC
The current catalogue contains over 43,000 catalogue entries over seven domains

1. Parks and Recreation (111 catalogue entries)
This category is composed of datasets ranging from recreation to conservation reserves, and includes datasets about park use, attendance statistics, and available amenities.
2. Housing (326 catalogue entries)
These datasets cover topics such as shelter system flow in the City of Toronto and housing affordability, housing tenures, and vacancy rates of housing across the country.
3. Homelessness (158 catalogue entries)
Many of the datasets in the CUDC on homelessness are provided by the Homeless Hub, which provides data such as the homeless count. Data is available for multiple cities and provinces across Canada, and includes topics such as housing, shelters, PiT counts, and social assistance.
4. Transportation (432 catalogue entries)
These datasets are categorized into four subgroups: origin/destination, stop analytics, zone management, and traffic. These subgroups provide information on the flow of traffic between regions, provinces, and national zones. They also provide traffic behaviour data such as movements at intersections and stop durations.
5. Farming and Economy (110 catalogue entries)
Data on farming and economy are subcategorized into groups such as livestock, economy and industry, and agriculture, and provide information on topics including the Census of Agriculture, livestock, and the bio-food industry.
6. Society (1170 catalogue entries)
This category contains data on various topics of interest for researchers focused on cultural and societal demographics data. Topics include demographic maps, data in relation to law and aid response, and diaspora data across the country.
7. Environment (988 catalogue entries)
Datasets under this topic are subcategorized into different groups such as air quality, greenness, environment monitoring and reporting, and vegetation. Examples of data in this set include smoke exposure, forest fire severity levels, water quality data, and sensitivity of bodies of water to climate change in different regions.
Populating the CUDC
1.
In 2022, UDRC convened two expert panels on transportation and housing to identify the data-related problems in their fields, what data is required to address them, and what data is not (easily) available.⁵ Each field experiences unique research problems that can limit decision-making, and the expert panels described gaps in their research and efforts to fill those gaps by collecting datasets from various sources. Some of the challenges involve the lack of representation of minority and marginalized populations in available data. For example, the panels expressed that some of their research problems are related to bridging equity gaps in populations. The researchers are seeking ways to address racial and physical inequities.
2.
UDRC has trained numerous undergraduate and graduate students to search for and catalogue datasets containing urban data. To date, they have catalogued over 1,000 datasets on topics like housing, transportation, culture, the environment, as well as parks and recreation across Canada.
3.
UDRC has developed a mapping application for the CKAN platform. It uses the CKAN API to extract catalogue entries, maps them into the DMCMM, then uploads them into CUDC. We have extracted over 40,000 datasets this way since 2024.
4.
UDRC is developing mapping applications for other widely used open data platforms, like Dataverse and ArcGIS.
The future
The Urban Data Research Centre (UDRC) and its Canadian Urban Data Catalogue (CUDC) are preparing for big developments. Their main goal is to promote the Dataset Metadata Capability Maturity Model (DMCMM), which helps organize and find urban data more easily. By encouraging official recognition and adoption through the Digital Governance Council, the UDRC hopes to see more datasets follow clear, standardized metadata guidelines. The DMCMM will be enhanced to provide expanded search capability and dataset evaluation. They also plan to broaden the catalogue’s reach by adding mapping tools for well-known platforms like ArcGIS and Dataverse currently used by open data portals and government agencies. Collaboration with experts from government, academia, and industry is another key priority. These partnerships will help identify missing data in areas like housing potential, governmental regulations, and migration of communities. UDRC plans to expand the awareness and contributing of additional metadata by focusing on trust and data provenance, data quality, data governance, and Indigenous data principles.
The CUDC’s open and community-focused approach is central to its plans for continued expansion. A range of contributors are invited to add or update dataset entries, share feedback, and suggest fresh ideas, including researchers, policy experts, and community members. This shared effort encourages cross-disciplinary cooperation, covering data on topics such as healthcare, housing, the environment, technologies, and urban planning. Because the catalogue is built on open-source software, developers and data scientists can refine and expand its functionality. UDRC encourages others to install their CKAN plugin and experiment with additional features. With this flexible and inclusive model, the UDRC can stay adaptable and responsive to new research questions, helping shape informed, data-based solutions for Canadian cities.
How to contribute?
Panelists and data partners have helped identify gaps in the research, and UDRC is always looking to expand the Canadian Urban Data Catalogue. Members of the community are encouraged to use the catalogue to search for datasets on the urban issues for which there are catalogue entries. UDRC welcomes the addition and curation of new interdisciplinary datasets and is looking for collaborative opportunities to help grow this project. If you are interested in more information about UDRC and collaborating, contact them through their contact form or email Dr. Bart Gajderowicz, UDRC Executive Director, at bartg@mie.utoronto.ca.
For more information about CUDC, please see Mark S. Fox, Bart Gajderowicz, and Dishu Lyu, “A Capability Maturity Model for Urban Dataset Meta-data.” arXiv preprint arXiv:2402.05211 (2024).
- Urban data is the collection and analysis of information related to the city and urban environments. It comprises data collected from several sources and in multiple ways, including census data, transportation data, data about infrastructure and services, and labour market data, to name a few. Data can be generated and collected in many ways, e.g. via surveys, reporting, GPS sensors, and environmental monitoring.
- Adegboyega Ojo, Porwol, L., Waqar, M., Stasiewicz, A., Osagie, E., Hogan, M., Harney, O., and Zeleti, F.A., “Realizing the Innovation Potentials from Open Data: Stakeholders’ Perspectives on the Desired Affordances of Open Data Environment”, in Working Conference on Virtual Enterprises (Springer, Cham., 2016), 48–59.
- Mark C. Paulk, Curtis, B., Chrissis, M. B., & Weber, C. V., “Capability maturity model, version 1.1.”, IEEE software 10, no. 4 (1993): 18-27.
- Fox, Mark S., Bart Gajderowicz, and Dishu Lyu, “A Capability Maturity Model for Urban Dataset Meta-data.” arXiv preprint arXiv:2402.05211 (2024).
- Pandya, M., Transportation problems and data requirements: Report of the Transportation Panel, (2023a). https://storage.googleapis.com/wzukusers/user-12947767/documents/c4609af45a0546deb1a4468616c808ad/UDC%20Transportation%20Panel%20Report%20v3.pdf
Pandya, M., Affordable housing problems and data requirements: Report of the Affordable Housing Panel, (2023b). https://storage.googleapis.com/wzukusers/user-12947767/documents/32479acd560c4ed1be6d1f8a393d8846/UDC%20-%20Affordable%20Housing%20Panel%20Report%20-%20v3.pdf