March 28, 2015
This is a rehearsal for a three-hour PyCon tutorial. Space is limited. Please consider whether you have the needed background. No food will be provided.
Presenter: Eric Ma
Audience level: Intermediate
Category: Science
Description: Have you ever wondered about how those data scientists at Facebook and LinkedIn make friend recommendations? Or how epidemiologists track down patient zero in an outbreak? If so, then this tutorial is for you. Here, we will explore a bike sharing data set as a way to understand the kinds of problems that can be solved using graph analytics.
Abstract
In this tutorial, I will show you how you can use data to construct networks for data analysis. The goal is to demystify graph analytics and mining, and make it accessible to the general programmer. Starting with understanding a toy data set as an anchor, we will go through:
• graph basics (nodes + edges, list and matrix representations),
• modelling problems as graphs,
• preprocessing data using Pandas,
• importing data using NetworkX,
• how to compute basic statistics of the network,
• generating visualizations using matplotlib,
• finding hubs, paths and clusters in the data,
• (if time permits) random graphs for statistical inference,
IPython notebooks and data files will be distributed beforehand on Github to facilitate code distribution.
As good pedagogical practice, we will have lots of guided hands-on time, and about 30 min to 1 hour of unstructured “free hacking time” to explore a bike sharing data set (with suggested questions) in small groups of your choice of size. You will also share your IPython notebooks via Github. After the hacking time, we will showcase a select number of analyses.
Meetup link: https://www.meetup.com/bostonpython/events/221201510/