# COVID-19 World Survey Data API

## Disclaimer

We are currently working to fine-tune our data weighting, aggregation and smoothing methods. As a result, the query responses returned by the api could be subject to change.

This COVID-19 indicators are derived from global symptom surveys that are placed by Facebook on its platform. The surveys ask respondents how many people in their household are experiencing COVID-like symptoms, among other questions. These surveys are voluntary, and individual survey responses are held by University of Maryland and are shareable with other health researchers under a data use agreement. No individual survey responses are shared back to Facebook. Using this survey response data, we estimate the percentage of people in a given geographic region on a given day who have CLI (COVID-like illness = fever, along with cough, shortness of breath, or difficulty breathing) or ILI (influenza-like illness = fever, along with cough or sore throat).

The detailed method description is listed in this document. Our API design is inspired by Delphi’s COVIDcast API from Carnegie Mellon University. If you are interested in other open Epidemiological Data API, please go check out delphi-epidata.

### Live Estimates

The daily COVID-19 indicators returned by our data api represent are our best estimates given all data that we have available up until now. The estimates for the indicator values for current day would typically be available two days later due to the data weighting and aggregation process.

### Smoothed Estimates

For each smoothed indicator, our estimates are derived using data smoothing techniques (akin to averaging, or weighted averaging) across an one week window. Smoothed estimates aggregate survey responses from multiple days for a geographic region. As a result, more geographic regions will have smoothed estimation results in comparison with daily live estimates.

### Missing Estimates

Generally, we do not report estimates at locations with insufficient survey responses (or insufficiently recent data).

## Release Log

• v2.1 (2021-01-20)

• We have implemented an updated definition for some of the aggregate indicators in the UMD Open Data API. This definition change impacts the following indicators: mask-wearing, financial worry, social distancing, and vaccine acceptance. We are implementing this change to make our estimates more directly comparable to those in Carnegie Mellon University Delphi research group’s COVIDcast API, which uses data from the US version of the COVID-19 Symptom Survey.

• Previously, we calculated estimates for the above indicators as percent of all survey respondents (where a survey response was defined as, at minimum, an answer to the country and symptom questions). We are now calculating estimates for the above indicators as percent of item-level respondents (i.e., where the sample for each indicator is survey respondents who also provided a valid response to the item of interest). That is, the denominator of the calculation now excludes people who did not respond to the particular item.

• In addition, we have updated the anosmia definition to be in line with that used for calculating CLI/ILI. That is, the anosmia estimates are calculated as a percent of all survey respondents (as mentioned above, for all calculations in our API, a survey respondent is defined as those who answered the country question and provided any response to the symptom question).

• The CLI/ILI indicators have not changed.

• We have implemented these changes and backfilled the API. We have also added sample size for each indicator, given that these will now vary depending on item-level response.

• v2.0 (2020-12-06)

• We released and update to out API. Before this update, the definition of covid-like illness(CLI) and influenza-like illness (ILI) had been inadvertently switched in the API data. We have implemented a fix that corrects both historic and future data. Click here to read a detailed summary of the issue and fix.

• This issue does not affect individual-level survey responses that are provided to entities who have signed a microdata use agreement. It also does not affect the US data which has been collected and distributed by Carnegie Mellon University. US estimates in the UMD and Facebook maps are also unaffected.

• v1.2 (2020-08-31)

• We are now requiring that a participant provides a valid response to the symptom question, that is, that they select yes or no for at least one symptom but not yes for all symptoms. This change affects both the aggregate data and individual-level data.

• We removed respondents from both the individual-level data and aggregate data if they had a weight greater than 12 times the mean or less than 1/30 times the mean within their country/region or administrative region. This change affects both the aggregate data and individual-level data.

• We corrected a miscoding of administrative region that affected aggregate estimates for 24 administrative regions globally. This miscoding occurred because of a bug in the raw data being reported to us from Qualtrics which incorrectly mapped administrative regions with the same name across multiple countries to the wrong numeric codes. This caused respondents from specific regions to be mapped to the incorrect administrative regions and corresponding countries. This change affects both the aggregate data and individual-level data.

• We have revised the regions used in aggregation of responses from the United Kingdom to reflect NUTS1 regions. Individual response data will continue to include the more granular self-reported region provided by the respondent. The administrative regions being provided to respondents in the United Kingdom vary in granularity across countries. Currently, the survey offers the following options to respondents who select United Kingdom as their country:

• England: Government Office Regions
• Scotland: Council areas
• Wales: Counties
• Northern Ireland: Counties
• For countries other than England, the administrative regions included in the survey are quite small and thus we frequently do not meet the minimum threshold of individual responses per region for reporting aggregate estimates of CLI/ILI. Additionally, we are sampling at the country-level (i.e., England, Scotland, Wales, Northern Ireland), so calculating CLI/ILI for these smaller regions may lead to biased estimates. To mitigate this issue in the short-term, we have re-mapped the more granular regions in the survey to match NUTS1 statistical regions. In the long-term, we are working to revise the list of regions being used in our sampling and survey item to better reflect commonly used statistical regions within the United Kingdom. Data that were released before this update (2020-04-23 to 2020-06-07) have been updated retroactively. This change affects only the aggregate data

• v1.1 (2020-06-09)

• update the calculation formula of smoothed_cli_se andsmoothed_ili_se fields in both country and regional smoothed estimates. The new calculation formula takes a square root of the original values.
• Data that were released before this update (2020-04-23 to 2020-06-07) have been updated retroactively.
• v1.0 (2020-05-31)