The Do’s & Don’ts of Open Source Building Performance Data

Building Performance Data

This is A Guest Post

OpenHVAC periodically asks professionals across the community to generate content which we feel is relevant to our community members. OpenHVAC is not associated with or receives any compensation for any of the the products and services listed in this article.

Dr. Philip Agee, is an Assistant Professor of Building Construction and the Assistant Director of the Virginia Center for Housing Research (VCHR) at Virginia Tech. His research advances industrial and systems engineering design and evaluation methods across the lifecycle of the built environment (e.g., design, construction, operation).
– Philip Agee, Virginia Tech

Significant data is generated in the built environment. Design and construction documents, performance simulations, commissioning results, and building performance data (BPD) are generated for all modern buildings. Today, system advancements in energy efficiency, integration, monitoring, and connectivity are expanding our design and facility management options throughout the built environment. Internet-enabled monitoring, ubiquitous user interfaces, machine learning, and artificial intelligence are facilitating analytical approaches not previously afforded to our industry. As buildings become more complex, the need to measure, analyze, and share BPD has become salient. 

Open datasets are critical for advancing building performance and analytical methods. Open BPD can be leveraged to calibrate design simulations and inform future design, construction, and operational decisions. Open BPD can facilitate benchmarking performance across building types, vintage, and geographic distributions1. Conversely, open BPD can be used to benchmark machine learning techniques for the built environment2. Simply put, open BPD are a critical component in the systems approach necessary to improve outcomes in the built environment including, but not limited to, reducing carbon emissions and operating costs and improving human experiences in the built environment.

The Architecture, Engineering, Construction, and Operation (AECO) industry rarely systematically collects or shares BPD today. Researchers increasingly collect BPD, yet rarely share BPD in open-source data repositories3. In recent years, however, there has been a trend to develop and share open-source BPD4–12. For example, there are recent open BPD contributions focused on occupant behavior impacts on energy use and indoor environmental quality (IEQ). These contributions span human-building interactions with appliances5, heat pumps7, and natural ventilation systems8. Other recent contributions span energy use across building typologies (e.g., commercial and residential) 9–12. This article contributes to the recent trend toward open BPD datasets. 

The Why, What, How, and When of Building Performance Data

Why, What, and How? 

Reliable data collection can be expensive and time consuming. Just because you can collect data does not mean that you should collect data. Even as a professional researcher, I am lazy and only collect data when I absolutely have to collect data. The collection of data, database development (defining variables, data types), sorting/cleaning, analyzing, validating, and synthesizing findings into useful information is hard. Dr. Joe Lstiburek says it best – “life is hard” and I would suggest you not collect data just because we now have internet-enabled sensing and logging capabilities. Instead, I suggest only collecting data to answer a specific question. The research question guides what data you need to collect (e.g., quantitative, qualitative) and how you need to measure/analyze the data (e.g., for diagnostic purposes or for long-term reporting) to answer the question.

It is important to remember that most building performance questions require both quantitative and qualitative data. As engineers, we cannot just rely on numbers, we have to talk to people. Quantitative data can tell me if something is (statistically) significant, but it cannot tell me if the phenomena I am studying is important. Conversely, qualitative data can tell me if something is important, but it cannot tell me if it is significant. When we are working on complex, systems-based problems that often relate to human factors, the mixing of quantitative and qualitative methods is necessary to unpack these interactions and answer our research question(s). 

Think about solving a thermal comfort problem in a home. The home is not uncomfortable, the people are. To answer the thermal comfort question, you need to mix quantitative data measured empirically (e.g., operative temperature, airspeed, relative humidity) with qualitative data (e.g., occupant perceptions collected from surveys, interviews). So how do you do this? Table 1. provides an example of how you can align research questions with aims for the data collection, the method(s), and data. I do this at the start of every research project. This is a fundamental research skill that is taught during Ph.D. training. 

1.What is the median electricity end-use profile for each apartment type?1.1Characterize hourly end-uses/costs for:
•heat pump domestic water heating
•miscellaneous electric loads (MELs)
•Circuit-level energy monitoring
•Empirical analysis 
•unit $/end-use
•unit HDD65 / CDD65
•simulation outputs
1.2Develop resident energy use profile to target energy literacy intervention•Resident survey(s)
•Resident interview(s)
•Descriptive statistics
•beliefs system interactions
Table 1. Example data collection plan

When to Collect Data?

When should you collect building performance data? It depends. I do want to introduce an approach that is fundamental and ubiquitous in the design of human-centered systems, but has not found a home in the built environment (pun intended). Formative and summative evaluations help us align when and how we should collect data with our research question(s).

Formative and Summative Evaluation Approaches

Formative evaluation helps you “form” the design and construction of a product or system13. If you are familiar with rapid prototyping – this is formative. In the built environment context, design reviews, energy simulations and load calculations, pre-drywall visual inspections and testing, and commissioning are examples of formative evaluations. Data from formative evaluations reduce risk by helping us align the system with our requirements. The more you get user feedback, the better you can adjust and reduce the risk of the product sucking – a non-technical way of saying “not meeting system requirements.” This formative data does not (typically) inform policy decisions or contribute to the body of scientific knowledge of a problem area.

Conversely, summative evaluations help you “sum” the outputs and outcomes of a product or system13. In the built environment context, summative evaluations could be performed with long-term monitoring of energy use and/or indoor environmental quality data (e.g., temperature, relative humidity, occupant surveys) after a building has been placed in service. Summative evaluations typically do inform policy and contribute to the body of scientific knowledge of a problem area. These evaluations require rigorous data collection procedures and aim for statistical significance. 

The takeaway? Do not just collect data for data’s sake. Align your data collection efforts to answer a research question, and identify how and when to evaluate the results (e.g., use of formative and/or summative evaluation) to improve outcomes in the built environment. 

Example of Open Source Building Performance Data

If you have made it this far and want to see an example output of an open Building Performance Dataset and the supporting data descriptor, you can check out the example below or those linked in the references. Thank you to the OpenHVAC community for the platform to discuss this important topic; I am excited to see what this community can do together.

Data descriptor:Agee, P., Nikdel, L.*, & Roberts, S. (2021). A Measured Energy Use, Solar Production, and Building Air leakage Dataset for a Zero Energy Commercial Building. SCIENTIFIC DATA 8, 299. 
Open dataset: Agee, P. & Nikdel, L. (2021). An energy use, energy production, and building air leakage dataset for a zero energy commercial building. 


1. Roth, J., Lim, B., Jain, R. K. & Grueneich, D. Examining the feasibility of using open data to benchmark building energy usage in cities: A data science and policy perspective. Energy Policy 139, 111327 (2020).

2. Granderson, J. et al. Accuracy of automated measurement and verification (M&V) techniques for energy savings in commercial buildings. Applied Energy 173, 296–308 (2016).

3. Huebner, G. M. & Mahdavi, A. A structured open data collection on occupant behaviour in buildings. Scientific Data 6, 292 (2019).

4. Kriechbaumer, T. & Jacobsen, H.-A. BLOND, a building-level office environment dataset of typical electrical appliances. Scientific Data 5, 180048 (2018).

5. Kelly, J. & Knottenbelt, W. The UK-DALE dataset, domestic appliance-level electricity demand and whole-house demand from five UK homes. Scientific Data 2, 150007 (2015).

6. Mahdavi, A., Berger, C., Tahmasebi, F. & Schuss, M. Monitored data on occupants’ presence and actions in an office building. Scientific Data 6, 290 (2019).

7. Ruhnau, O., Hirth, L. & Praktiknjo, A. Time series of heat demand and heat pump efficiency for energy system modeling. Scientific Data 6, 189 (2019).

8. Schweiker, M., Kleber, M. & Wagner, A. Long-term monitoring data from a naturally ventilated office building. Scientific Data 6, 293 (2019).

9. Miller, C. et al. The Building Data Genome Project 2, energy meter data from the ASHRAE Great Energy Predictor III competition. Scientific Data 7, 368 (2020).

10. Rashid, H., Singh, P. & Singh, A. I-BLEND , a campus-scale commercial and residential buildings electrical energy dataset. Scientific Data 6, 190015 (2019).

11. Paige, F., Agee, P. & Jazizadeh, F. flEECe, an energy use and occupant behavior dataset for net-zero energy affordable senior residential buildings. Scientific Data 6, 291 (2019).

12. Klemenjak, C., Kovatsch, C., Herold, M. & Elmenreich, W. A synthetic energy dataset for non-intrusive load monitoring in households. Scientific Data 7, 108 (2020).

13. Hartson, R., & Pyla, P. S. (2018). The UX book: Agile UX design for a quality user experience. Morgan Kaufmann.

Leave a Comment

Your email address will not be published. Required fields are marked *