Tutorial Analyzing Airline Data and Scrapping IMDB with R
Welcome to the companion website of the R Tutorial. This website includes the tutorial R-Markdown files that analyzes United Airlines data together with an example of webscrapping
- 👉 It’s one of the most popular tools used for data analysis
- It can handle large amounts of data (unlimited!)
- It’s perfect for unstructured data (Twitter, Facebook, News, Reedit, Linkedin, IMBD, etc)
- It makes some of the best pictures in the business
- It records every step of your analysis (easier replicability!)
- It has a huge online community of users/supporters
What will we do in the tutorial?
We will get an introduction to R by following the steps recommended by the book “R for Data Science” (reference below). These are basic steps that anyone wishing to embark in a data analysis project should follow:
- Import (from a file, a database, a website, etc)
- Tidy (mainly clean and make it consistent)
- Understand: Transform -> Visualize -> Model -> Repeat … (Graphs, Statistical Models, Machine Learning, etc)
What should I do before the tutorial?
You can get ready for the tutorial by either
- Installing R and R Studio in your own computer
- Or Signing-up in RStudio Cloud before the tutorial. It’s free. It’s easy. It’s in the cloud. If you want, you can log in, create a project and start playing with RStudio.
- Then, the day of the tutorial (or before if you have the time). Please login into your RStudio Cloud account, create a project and upload the tutorial files to your new project.
- Scrapping Target: IMBD Top 100 Feature Films (between 1990-2016)
- Scrapping Code => Scrapping Report
- Grolemund, G. and Wickham, H. (2017) “R for Data Science”, O’Reilly Media
- R Markdown Cheasheet
- Tidyverse: a collection of the most commonly used packages used in data science
- Documentation of ggplot2: package to make nice looking graphs
- Documentation of PivotTabler: package to make pivot tables
- Documentation of glmnet: package that fits generalizable linear models