PyCon Rehearsal 1 - Comparing Unicode and Predicting Oscars

April 27, 2017

   

Two rehearsals of upcoming PyCon talks, sponsored and hosted by Wayfair (https://www.wayfaircareers.com/).

Text is More Complicated Than You Think: Comparing and Sorting Unicode

Morgan Wahl

Few people realize just how complicated text can be. Did you know sorting and even case-folding can depend on a user’s locale? That different strings of characters can be semantically completely equivalent? That there are over a thousand Latin letters?

Legacy text encodings like ASCII made a lot of simplifying assumptions about how written languages work, and we all put up with them because it was cool to even have computers in the first place. Unicode removes many of those assumptions and provides the tools we need to write software that can just do the right thing regardless of what text users throw at it. Even if you don’t translate your UI, getting the details of string comparison, sorting, and searching right can eliminate annoying surprises for you and your users.

Lights, camera, action! Scraping a great dataset to predict Oscar winners

Deborah Hanus

Using Jupyter notebooks and scikit-learn, you’ll predict whether a movie is likely to win an Oscar (http://oscarpredictor.github.io/) or be a box office hit. Together, we’ll step through the creation of an effective dataset: asking a question your data can answer, writing a web scraper, and answering those questions using nothing but Python libraries and data from the Internet.

Pizza (and I think beer!) will be provided.

Meetup link: https://www.meetup.com/bostonpython/events/238238860/

Back to Past Events Page