Data Storytelling and smart meter energy analytics - I
some thoughts on creating richness out of sparse data
- Prelude
- Data Storytelling
- Dynamic Data Storytelling
- Stories: Telling truth without many facts
- Automatic, Dynamic-Data Storytelling
- Personalized, Automatic, Dynamic-Data Storytelling
Prelude
I finished the last blog entry with an assertion that energy analytics could be a key differentiator in the crowded energy distribution market in France. And I even hypothesized that some aspects of these analytics might become a default expectation for the consumers (since EDF provides them).
"The energy meter Linky collects barely any data," a Machine Learning engineer would say (24 samples per day). So creating deep insights isn't in the realm of practicality. This perspective is the reason why companies like Smart Impulse or Voltaware have different types of add-on sensors that collect current data at a far higher frequency that can mine much more insights about the household consumption.
The question is, what do the customers want? I'm going to bet on a time-tested human characteristic that gets exploited by social media companies: Stories.
In this article, I will trace my motivation by going through various forms of storytelling that uses data and bring up their relevance for energy analytics. The idea of this article came through a series of discussion with my friend Florian M-B, who is exploring the use of storytelling in recommendation systems as part of his doctoral studies.
Data Storytelling
In his book, Sapiens: A Brief History of Humankind, Yuval Harari contends that one of the reasons our species rose to prominence on Earth is our storytelling ability (and extension such as imagined reality). In a way, humans are storytelling apes.
Data storytelling has become an indispensable part of modern-day journalism. Hans Rosling's Gapminder was one of the pioneers creating insightful stories about developmental activities using data collected over decades. That visualization approach now has become standard in reporting stories that affect millions of humans.
The COVID-19 pandemic brought forth the full force of the use of Data Storytelling. The Washington Post article titled 'Why outbreaks like coronavirus spread exponentially, and how to flatten the curve ' by Harry Stevens is likely the most read article on their website. Data storytelling has also become part of several key projects that share the motivations that journalists typically espouse. Sites such as Our World in Data aim to focus precisely on that.
Academia has also long talked about this through several research papers on Data storytelling and visualization [1], [2].
Dynamic Data Storytelling
One thing that emerged during the COVID-19 pandemic was the flurry of websites that provided a near-real-time update of the pandemic. A possibly obscure website like Worldometer exploded in site visits by creating a simple aggregation of the COVID-19 numbers in different countries. During the initial stages of the lock-down, I would open up the website several times a day to follow the dynamics. There are several such sites today doing the aggregation (including excellent country-specific sites by respective governments). Some sites also provide forecast based on the models the sites' data-scientists have developed.
These sites go one step above journalistic data storytelling by setting up a platform that helps convey some form of a story by dynamically accumulating data from multiple sources. Though the administrators behind the side update several aspects of the visualization, there is an automatic storytelling vibe. These sites inspired me to consider how one can extend this idea further.
Stories: Telling truth without many facts
Energy data analytics is challenging when only limited data is available. This limitation is why the current capabilities based on the Linky data are limited to:
- Daily Consumption Curve
- Yearly break up of consumption categories
- Comparison of Individual Consumption with Similar households
I know that the yearly consumption breakup is erroneous in my case (the algorithm misidentifies my hot-plate stove load as water heating load, though this is understandable as both are resistive loads with similar usage characteristics in my house). I also have anecdotal evidence that a similar household comparison was misleading for at least one person I know. So are these pointless?
Neil Gaiman, a master storyteller of our generation, once said about his transition from journalism to writing fiction, "I wanted to be able to tell the truth without ever needing to worry about the facts". While it might sound preposterous, the data storytelling for energy analytics with sparse data might have to take a leaf out of this approach. That is, even if we lack facts (data), we can strive to provide the truth (energy-awareness) by creating stories using:
- Universal commonalities (using statistics of all the data collected from everyone)
- Unusual occurrences (observing the anomalies in the smart meter data)
- Influences of the natural and social elements (using weather data, social events such as holidays)
- Legends from the past (historical data)
- The mental picture of a hero (energy consumption model of a particular household)
In his talk in NILM workshop 2018, Oliver Parson frets about the question that keeps him awake at night, "Are our insights useful for the customers to save energy?". I think that part of this owes to the expectation that Data Scientists want a cause-effect relationship between tips and energy savings. At the same time, they know that this causal relationship is weak due to the lack of data.
One thing that can bring about a change to this situation is to note that energy-awareness is like motivation. One message might keep one motivated for a few days, at the most, and the repetition of a message can quickly create fatigue. Motivation is sustained through personal touch and novelty in the messages received. And this is the kind of effect energy a Data scientist would want to bring in.
Even though the visualization generated for me by EDF is erroneous in terms of its results, it gave me sufficient motivation to look into my energy usage practice. For example, I used the daily energy consumption summary to understand the energy consumed by individual appliances (based on the weekday-weekend difference or whether I was present at home or not, etc.). This exploration provided me with a sense of contentment on my comfort-cost balance.
Automatic, Dynamic-Data Storytelling
Generating automatic infographics using data from multiple sources is widespread today due to the explosion in the number of sensors to monitor complex systems. During my postdoctoral research work, we worked with TLGPro and their Neotool that provides visualization for data from a multitude of sources. Several Build Energy Management System software products essentially do the same.
In a way, NILM algorithms relying on high-frequency data build on top of such visualization and provide the analytics and insights to the user. One significant difference here, however, is the target audiences. For complex visualization tools, the target audiences are experts (a manager in a factory or a building). Whereas in the energy analytics case, the target audience is an average consumer. Hence, the Data Storytelling aspects that journalists use become more pertinent.
Personalized, Automatic, Dynamic-Data Storytelling
Beyond the question of the target audience, there is an element of 'personalization' involved in energy data analytics. Consumption data from a user is only available to their eyes. And they can only see an anonymized, cumulative version of others' data. Constraints, however, can also help in traversing the chaos of endless choices.
This perspective does not necessarily change what currently the utilities are offering insights to the consumers. But the method of offering can change:
- Apart from offering daily consumption and weather per day as a list, one can display a weekly graph and compare one week to another when there is a significant temperature change.
- Along with offering 'comparison with similar household', one can add how similar household consumption changed during a vacation period compared to yours. Or how weather patterns changed mean consumption compared to the individual.
- If the individual was concerned by a peak in their consumption but noted that this universal due to a weather pattern, they may be less alarmed. If not, the consumer may consider it as a call for action.
- Provide a forecast of what would the individual's monthly or yearly bill would be if their consumption pattern is maintained.
- A user can be presented with different paths to a particular energy bill. For example, suggesting the user shift consumption during non-peak periods.
And so on. A lot of this depends on how much data the customer is willing to share with the utility company. And making these visualizations automatic (or at least semi-automatic) has its own set of challenges.
Automatic storytelling is a popular field in AI that extends its arms to cover areas beyond summarization or plot point generation [3]. Metrics used for machine translation such as BLEU, METEOR or those for summarization/translation such as ROUGE) can be good starting points.
Academia has significant work from the perspective of applying storytelling techniques for time-series data [4]. I would spend more time on these and works such as that by [5] to better formalize some automatic storytelling mechanisms. I also plan to develop some example visualizations for these ideas.
[1] Gratzl et. al., From Visual Exploration to Storytelling and Back Again, 2016.
[2] Ojo and Heravi, Patterns in Award Winning Data Storytelling: Story Types, Enabling Tools and Competences, 2017.
[3] Yao et. al., Plan-and-Write: Towards Better Automatic Storytelling, 2019
[4] Battad and Si, Apply Storytelling Techniques for Describing Time-Series Data. 2018.
[5] Zhu et. al., A survey on automatic infographics and visualization recommendations, 2020.