Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization

Speaker: Ryan Compton
Category: Colloquium
Affiliation: Howard Hughes Research Laboratories
Location: MSE 116

Date: Monday, November 3, 2014
Time: 11:10am — 12:00pm Import into an iCal-compatible calendar


Geographically annotated social media is extremely valuable for modern information retrieval. However, when researchers can only access publicly-visible data, one quickly finds that social media users rarely publish location information. In this work, we provide a method which can geolocate the overwhelming majority of active Twitter users, independent of their location sharing preferences, using only publicly-visible Twitter data.

Our method infers an unknown user's location by examining their friend's locations. We frame the geotagging problem as an optimization over a social network with a total variation-based objective and provide a scalable and distributed algorithm for its solution.
Furthermore, we show how a robust estimate of the geographic dispersion of each user's ego network can be used as a per-user accuracy measure, allowing us to discard poor location inferences and control the overall error of our approach.

Leave-many-out evaluation shows that our method is able to infer location for 101,846,236 Twitter users at a median error of 6.38 km, allowing us to geotag over 80% of public tweets.


Ryan Compton is postdoc in the Information and System Sciences Laboratory at Howard Hughes Research Laboratories in Malibu, CA. His work focuses on social media data mining for early detection of newsworthy events. In 2012 Ryan finished a mathematics PhD at UCLA with a thesis on sparsity promoting optimization for quantum mechanical signal processing. His website is