Using AI to extract insights from millions of multifamily housing reviews

Online user reviews are a key source of information in many industries and can impact both consumer and business decisions. This is no different in real estate, where tenant reviews of the properties in which they reside uncover valuable information about the residence experience. JLL Technologies asked whether we can extract useful information from this data to help our clients better understand a property they would like to sell, purchase, or refinance.

Online tenant reviews are typically a few dozen to hundreds of words long. However, with the rapid growth in the number of reviews—and given the hundreds of thousands of multifamily residences, manually analyzing the large number of reviews to gain insight is often impractical. In fact, some properties have thousands of reviews themselves, and comparing to similar assets, a common task in commercial real estate, would necessitate days of reading.

Motivated by the potential value contained in tenant reviews and the need for automation to extract this information, we asked whether modern AI techniques could be used to gain insights. We first collected 5.5 million online tenant reviews covering the entire United States. This dataset, which had reviews dating back to 2001 and covered nearly 100,000 properties, was obviously large. In fact, it contained more than half a billion words and was approximately 15% the size of Wikipedia.

Given this large textual dataset, or corpus in scientific terms, what kind of information should we try to extract? We collaborated with real estate experts at our company to answer this question. Through a series of interviews, the real estate domain experts highlighted the importance of detecting quality of life issues, especially ones that are costly or difficult to remedy. For example, reducing crime levels in a property’s vicinity is generally beyond the capability of a particular owner or management group – yet this has a major influence on the attractiveness of a property for a prospective buyer (or tenant for that matter). Ultimately, we settled on four specific quality of life issues:

• Crime: Is there severe crime at or near the property?
• Noise: Is there regular noise nuisance and/or are there thin walls?
• Parking: Is there limited or insufficient parking?
• Pests: Is there a recurrent presence of pest burden (roaches, mice)?

Before applying AI techniques, specifically deep learning models, we created a small human-annotated dataset of 0.1% of the reviews to enable model training, and more importantly, evaluation of results. To do so, we randomly sampled thousands of reviews and performed crowdsourced labeling using up to nine human labelers per review. The labelers received detailed instructions beforehand and the final review labels were chosen by a majority vote. 83% of reviews were labeled identically by all labelers, whereas the majority vote was needed in the 17% of cases that did not have full consensus.

The manual labeling process entailed various steps. These included vendor comparison, refinement of the labeling instructions—for example, what specifically should be considered “constant” noise issues as well as examination of different platforms. This process paid off, not only by providing a high-quality dataset, but also by helping us better understand the nuances of each quality-of-life label itself. Below is the resulting label distribution.

Finally, we were ready to do what data scientists love doing: run state-of-the-art machine-learning models on large data. For this, we compared classic natural-language processing approaches such as bag-of-words with more modern, neural network representation language models—notably, BERT-based, pretrained language models (Devlin, 2018) that leverage the transformer architecture (Vaswani, 2017). We also conducted fine-tuning of the BERT models to better fit the specific linguistic properties of our corpus by self-supervised training on half of the corpus’ reviews.

Evaluation was done using five-fold cross-validation. Given that the label distribution is somewhat imbalanced, we used various metrics, including F1, to understand the results.

Generally, the results highlighted that the models, especially the finely tuned BERT models, identified the four labels well, with high area under the receiver operating characteristic curves (AUROC) of 0.95 and top F1 scores per label ranging from 0.5 to 0.79.

As expected, the finely tuned models required less labeled data to reach the same performance as their non-finely tuned counterparts. Below we see learning curves comparing non-finely tuned RoBERTa models (black lines) with finely tuned RoBERTa models (colored lines). We can see that AUROC results improved as models were trained with sequentially increasing labeled data, with shading representing standard deviation ranges resulting from evaluation on individual cross-validation folds.

The entire dataset was labeled for downstream analysis using the finely tuned RoBERTa model.
Having an AI-labeled reviews dataset of 5.5 million reviews enabled us to perform analyses that were otherwise impossible, namely using the tenant perspective of properties to conduct large-scale real estate analyses, both locally and nationwide.

For example, we could automate property comparisons, analyze geographic and temporal trends in multifamily properties, and even raise key business considerations for prospective investments. As such, in the scientific publication describing this work, we correlated asset quality grades with review labels, examined the relationship of property age with the reviews, and even compared FBI municipal crime statistics with tenant reviews related to crime levels. Indeed, we found positive associations between review sentiment and the said aspects.

Business-wise, our company has already used AI-based tenant review analysis to help clients better understand their properties prior to the sell process.

Beyond reviews, the use of AI and alternative data is increasingly expected to have far-reaching implications in commercial real estate. At JLL Technologies, we hope to continue to use machine-learning to further help digitalize real estate for the benefit of our company and our clients.

For more details on this work, see the full paper.

References in blog:
Devlin et al.
Vaswani et al.