Crowdsourced Climate Impact Surveillance provides real time mapping of self-reported climate impacts via scraping of geotagged Twitter data.
Cities globally face significant risks from climate change. Urban areas are home to >50% of the world’s population, are growing rapidly, and often concentrate economic activity, population, and infrastructure in high risk locations. Many of the largest cities are located in coastal areas, for instance, and are thus exposed to projected increases in sea level, storm activity, and associated flooding (Hanson et al., 2011). Given these threats, cities are taking an increasingly active role in climate policy action, with mitigation experiences well-documented (Castán Broto and Bulkeley, 2013). There are also widely referenced examples of city leadership and action on adaptation (e.g. New York, London) (Rosenzweig and Solecki, 2014; Wilbanks, 2011).
Here we propose and develop a methodology for scraping Twitter and spatially visualizing reports of climate impacts in real time. Municipal governments, academic research groups, and non-governmental organizations have have led most existing efforts to record climatic impacts, understand the vulnerability of urban populations, and devise adaptation programmes, policies, and actions. However, there have been few attempts, if any, to crowdsource reporting of climatic impacts and vulnerability.
Short term: Our objective is to create a cartographic map with overlaid Tweets reporting weather impacts in real time.
Long term (50 years): Our goal is to shift climate impacts surveillance from a centralized, resource intensive system with a top-down rationale to a flexible, real-time, self-reported, crowdsourced, and participatory method. The goal here is not only to map, but to incentivize a shift in the fundamental manner in which weather is surveilled.
What actions do you propose?
Twitter scraping in real-time
We will create atwitter-specific algorithm to identify, filter, and retrieve any relevant reported climate impact Tweets for the city of New York in real-time. Information that we will gather from each retrieved tweet will be location (geo-tagged [coordinates] or reported), content of the Tweet, time-stamp, and a record of any hashtags used. Based on each Tweets written content, they will be filtered into pre-defined climate impact categories (i.e., flooding, landslide, heat stress). Relevant content and hashtags will continually change as events happen. A Twitter scraping manager from the CCIS team will update the twitter searching algorithm daily/ weekly with relevant content to scrape (i.e., name of a hurricane: #sandy).
Tweets stored on data repository
Tweets will be continually updated and stored in a geo-database on a supercomputer at McGill University in the Climate Change Adaptation lab. The data repository will be monitored and managed by a CCIS database manager to ensure data accuracy, quality, consistency. High quality, temporally and spatially robust data will be made available upon request for research purposes from the database managers.
Open-source urban climate impacts map on the web
A base-map of the city of New York will be updated in real-time with approximate locations of categorized climate impact tweets. Tweets will be represented thematically as coloured dots on the city base-map representing one, or multiple, climate impact categories (described above), and will be decipherable from a legend. Climate impact tweet locations will remain on the base-map for a 24-hr period from the time they were reported. However, temporal proximity in climate impact reporting will be indicated by the vibrancy of the dots colour - Tweets will fade in colour every couple of hours since they were first reported. The map will be interactive: website users can retrieve the original contents of the Tweets by clicking on the categorized climate impact locations. Users can also limit the map to display individual climate impacts through an online filter function and we will have a historical map interface option where users can retroactively retrieve Tweets from time-period.
Goal of First Year:
First month: July 2015
CCIS Research team purchases necessary software and programming resources and hires programmer.
Various combinations of twitter scraping algorithms tested and re-developed to maximize efficiency in capturing relevant tweets.
Geodatabase to store tweets in real-time will be concurrently developed and created on a high-speed high storage computer in the CCIS team’s research lab.
Second - Fourth month: August - October 2015
First phase of development of Twitter scraping algorithm completed and linked to cloud-based geodatabase to test real-time updating and storing of tweets.
A function to categorize tweets into predefined weather/climate related impacts will be developed within the geodatabase.
Online cartographic map of New York city is developed by web developper.
Fifth-Seventh month: November - January 2015/2016 (Putting it all together)
Putting all the pieces together. Linking the twitter scraping algorithm to the geo-database to the online cartographic map where the tweets will be visualized in real-time.
Maximize efficiency in online reporting of tweets from when they were first tweeted by the twitter user.
Preparing online cartographic map for public viewing.
Eighth -Twelfth month: February - June 2016
Packaging Twitter scraping algorithm and web-based platform into clean, concise, and replicable code for application in other major cities impacted by climate change.
Website will be launched.
Website and application will be promoted to end users (disaster management responders/ health official etc. ) through personal connections and social media (i.e., Twitter) and promoted to the general public (possible Tweeters) through social media.
Code for the entire package will be made open to the public on the website for transparency in our methodology but to also allow for requests for improvement.
Proof of concept
Twitter is a widely used and rapidly growing microblogging website. In the first quarter of 2015, there were nearly 65,000,000 active twitter users in the United States( Statistica, 2015). Data from the Pew research center indicates that nearly one quarter of urban individuals with internet access now use twitter, with over one third of users visiting the site daily (Duggan, et al, 2015).
From detecting Earthquakes to predicting disease outbreaks, evidence suggests that Twitter data can be effectively put to work to solve large scale complex problems (Sakaki, Okazaki, and Matsuo, 2010; Becker, Naaman, Grayano, 2011; Heaivilin, Gerbert, Page, and Gibbs, 2011). In a 2012 paper, Louis and Zorlu stated that the H1N1 outbreak of 2011 could have been identified weeks earlier if symptoms had been digitally monitored on Twitter. This study emphasized the potential for twitter to be used to detect, and respond to monitored events, with Twitter creating a “new frontier” in surveillance.
Several studies have provided in-depth confirmation that twitter can work effectively with weather and disaster related data. In a 2010 paper, Sakaki, Okazaki,and Matsuo discuss an algorithm that can actively locate earthquakes in Japan via tweets with a 96% success rate, deliver warning emails to residents faster than Japan’s public meteorological agency. Sarcevic et al., (2012) noted how twitter was used during the 2010 Haitian earthquake as a decentralized way of coordinating medical assistance. Vieweg, Hughes, Starbird, and Palen (2010) analyzed tweets from two emergency weather events in the U.S., a flood and a grassfire, demonstrating Twitter’s potential in providing situational awareness during disaster scenarios. Chatfield, Scholl, and Brajawidagda (2014) gathered over 130,000 tweets from the three weeks surrounding Hurricane Sandy, noting the important role Twitter played in redistributing public service information.
Twitter also offers opportunities to understand how climate change more generally. Using tweets scraped from the U.S. between 2012 and 2013 , Kirilenko, Molodtsova, and Stepchenkova found evidence indicating that twitter can be used to pick of weather anomalies that users relate to climate change. Jang and Hart (2015) collected nearly 5.7 million tweets related to climate change over a two year period. They were able to identify significant geopolitical variations in how Twitter users talk about climate change. These studies demonstrate that our project is not only realistic and can be implemented effectively using existing software.
Who will take these actions?
The CCIS research team will undertake the proposed actions described above: planning and research, implementation of the case study, and launching of a website with real time self-reported weather impacts. Private individuals will be able to access the website for their own perusal. The CCIS research team will also be responsible for monitoring and updating the accuracy of the Twitter scraping algorithm and tweaking its search functions when new events or relevant search terms are spontaneously developed.
Where will these actions be taken?
This project will be limited to one specific city, New York, during the case study and initial website stages. There is a large volume of Twitter data required to undertake this project at a regional or global level, putting constraints on logistical and computing resources. Therefore at the beginning we will only undertake the project in New York City. In the medium term (~2 years) our goal is to expand the coverage to select other cities in North America vulnerable to the impact of climate change. If piloting is successful and useful, then we will aim to expand globally for worldwide crowd sourced weather surveillance.
What are other key benefits?
Benefits for disaster response units. Surveillance of climate related disaster events in major cities is constrained by an availability of resources, expertise, and capacity. Our proposal will provide disaster response units with information to monitor and track events, and predict outcomes, before or while they are happening, at zero cost to these departments.
Benefits for the global research community. Big-data sets on climate/weather impacts collected in real-time and at exact or approximate locations are costly and near impossible to collect. Our Twitter data repository will provide researchers with large datasets that are temporally robust and spatially exact.
Contributions to technological innovation in climate surveillance and assessment. This concept could provide invaluable free and easily accessible information for disaster response units in municipalities in low-resource settings, who are also often the most vulnerable to the negative impacts of climate change.
What are the proposal’s costs?
Data collection: While it’s straight forward to collect basic twitter data, more in-depth information can be gathered using software offering historical Twitter analytics ($199/Month*18 = 99/Month*18 = $3,582)
Data hosting: We want to make sure the data we’ve collected is available for quick and easy download from our website. To ensure this, we’re purchasing a high-performance SQL database that will allow our data to be stored in an accessible format in the cloud. ($168/Month*18 = 68/Month*18 = $3024)
Web-design and service: We will hire a web-designer to help us build a website that will be connected to our SQL database. (One time cost = $2,500)
Cold storage: Everyone hates losing their data. To ensure this never happens to us, we will back up all of our data on a cold storage system (One-time cost = $900)
Total = $10,009
Immediate (Year 1): The accuracy and precision of the twitter scraping algorithm will be tested, validated, and improved. The map-based web platform for New York City will be developed and the method of updating the website in real-time will be explored. Relationships and partnerships with the municipal authorities responsible for climate/weather disaster monitoring and response in New York City will be made.
Short-term (5-15 years): The map-based web platform and Twitter scraping algorithm will be continually updated and improved. The concept will be expanded to other cities interested in this alternative form of decentralized and crowd-sourced method of climate impact surveillance. The scraping algorithms and web platforms will be adapted and contextualized to each specific municipality based on their unique climate impact risks, Tweeting norms, and major languages used.
Long-term (15-100 years): Major cities will have a standardized web-platform that can be used for climate impact surveillance which will be tailored to the specific risks the city predominantly faces. The method of data reporting will be expanded to new social media platforms (yet to be developed) and spatial information obtained from phones and computers will become more accurate and precise for better reporting quality. This concept may shift climate impacts surveillance from a centralized, resource intensive system with a comprehensive/rationalist planning framework to a flexible, real-time self-reported, crowdsourced, participatory method. The long-term goal will be a shift in attitude and fundamental manner in which weather is surveilled.
Hanson, S., Nicholls, R., Ranger, N., Hallegatte, S., Corfee-Morlot, J., Herweijer, C., & Chateau, J. (2011). A global ranking of port cities with high exposure to climate extremes. Climatic change, 104(1), 89-111.
Castán Broto, V., & Bulkeley, H. (2013). A survey of urban climate change experiments in 100 cities. Global Environmental Change, 23(1), 92-102.
Rosenzweig, C., & Solecki, W. (2014). Hurricane sandy and adaptation pathways in New York: Lessons from a first-responder city. Global Environmental Change, 28, 395-408.
Wilbanks, T. J. (2011). Overview: Climate change adaptation in the urban environment. In J. D. Ford & L. Berrang-Ford (Eds.), Climate Change Adaptation in Developed Nations: From Theory to Practice (pp. 281-288). Dordrecht, Netherlands: Springer.
Sakaki, T., Okazaki, M., & Matsuo, Y. (2010, April). Earthquake shakes Twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web (pp. 851-860). ACM.
Becker, H., Naaman, M., & Gravano, L. (2011). Beyond Trending Topics: Real-World Event Identification on Twitter. ICWSM, 11, 438-441.
Heaivilin, N., Gerbert, B., Page, J. E., & Gibbs, J. L. (2011). Public health surveillance of dental pain via Twitter. Journal of dental research, 90(9), 1047-1051.
Sarcevic, A., Palen, L., White, J., Starbird, K., Bagdouri, M., & Anderson, K. (2012, February). Beacons of hope in decentralized coordination: learning from on-the-ground medical twitterers during the 2010 Haiti earthquake. In Proceedings of the ACM 2012 conference on computer supported cooperative work (pp. 47-56). ACM.
Jang, S. M., & Hart, P. S. (2015). Polarized frames on “climate change” and “global warming” across countries and states: evidence from twitter big data. Global Environmental Change, 32, 11-17.
Chatfield, A. T., Scholl, H. J., & Brajawidagda, U. (2014, January). # Sandy Tweets: Citizens' Co-Production of Time-Critical Information during an Unfolding Catastrophe. In System Sciences (HICSS), 2014 47th Hawaii International Conference on (pp. 1947-1957). IEEE.
Vieweg, S., Hughes, A. L., Starbird, K., & Palen, L. (2010, April). Microblogging during two natural hazards events: what twitter may contribute to situational awareness. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1079-1088). ACM.
Duggan M, Ellison NB, Lampe C, Lenhart A, Madden M. While Facebook remains the most popular site, other platforms see higher rates of growth. In: Social media update 2014. Washington (DC): Pew Internet and American Life Project; 2015.
Statistica, 2015. Number of monthly active Twitter users in the United States from 1st quarter 2010 to 1st quarter 2015 (in millions). Accessed April 2015. <http://www.statista.com/statistics/274564/monthly-active-twitter-users-in-the-united-states/ >