Neural Nets - units and decision boundaries

I have an assignment that involves building a language identifier (given text, predict which language is the text from) using Neural Nets. I wanted to use this opportunity to make a few posts to cement the idea into my head. I hope you find this intuitive and helpful. So lets talk about NN in terms of classification, when I think of neural networks, I think of applying a multi-layered system to make a decision. [Read More]

Exploring flight data

Background The given dataset comes from the Office of Airline Information, Bureau of Transportation Statistics (BTS). It consists of contains on-time arrival data for non-stop domestic flights by major air carriers, monthly from 1987 to 2017, and provides information such as departure and arrival delays, origin and destination airports, flight numbers, scheduled and actual departure and arrival times, canceled or diverted flights, taxi-out and taxi-in times, air time, and non-stop distance. [Read More]

Data from DataCamp

DataCamp boasts to be “the easiest way to learn Data Science Online” and has courses of different levels taught using R or python. I wanted to see how popular some of the courses were and which technology they used, so a quick use of rvest was required. Rvest is a package developed by Hadley Wickham that allows one to easily scrape web pages. Below is the function used to get the relative links for each course based on the technology used. [Read More]

GSOC III

This week was a busy one. Implemented most of the survey summary statistics and the jackknife to estimate their standard error. This was my first time learning about the jackknife and I thought it was an interesting topic to tell you all about. Within survey data, you tend to have strata and PSUs within the strata that make up subgroups. Now, let’s assume that we want to calculate the mean for your survey data - this entails something along the lines of [Read More]

GSOC II

Another blog post about my journey with GSOC. This week I’m implementing some summary statistics for survey data. Surrvey methodology quite resembles ordinary statistical methodology, but there are some pervasive subtleties there due to the importance of the study design for doing standard errors, tests and confidence intervals in a defensible manner. Learning what others have done to make robust analyses with their survey data is quite fun. The hard part is of course the implementation. [Read More]

GSOC I

This summer I’ll be working on implementing Survey Methods in StatsModels thanks to Google Summer of Code (GSOC). I’m excited because I took my first official programming course last semester (C++ seems to stand strong against the test of time) and get to learn more about open source software development. I still remember the first programming course I signed up for during my freshman year at Rice. I dropped after the first week. [Read More]