100daysofdatascience

Stanford SQL Exercises #100daysofcode #100daysofDataScience #Day27 #unicornassembly #netflixandcode

While working on the fundamentals (stats) I haven’t forgotten about keeping my tech skills sharp. Last week on Sunday I decided to take a break from z-scores and t-scores to do a little SQL practice. I’ve already blogged about my humble beginnings with SQL on Khan Academy but it’s nice to have a place to just write some queries without overhead.

The Stanford SQL Movie Rating Query Exercises provide just that! These were suggested to me by a friend prepping for Data Science interviews. It’s a great way to get your brain in SQL mode and to test your speed. SO, start the timer and see how fast you can go through ‘em!

Here are the queries that I found the most difficult with my #netflixandcode solutions. Admittedly a little clunky but it gets the job done

Question 6: For all cases where the same reviewer rated the same movie twice and gave it a higher rating the second time, return the reviewer's name and the title of the movie.

Screen Shot 2019-08-07 at 7.32.10 PM.png

Question 9: Find the difference between the average rating of movies released before 1980 and the average rating of movies released after 1980. (Make sure to calculate the average rating for each movie, then the average of those averages for movies before 1980 and movies after. Don't just calculate the overall average rating before and after 1980.)

Screen Shot 2019-08-07 at 7.33.13 PM.png

Statistics wizard in the making thanks to @khanacademy #100daysofdatascience #100daysofcode #Day20 #unicornassembly

Khan Academy puts the FUN in Fundamentals with the AP Stats Course

My Course Progress

My Course Progress

I rely heavily on internet courses and certifications to brush up on skills. Before using Khan Academy I thought of it as a study aid for kids. That changed when I was looking for a fundamental course on SQL. The Khan Academy SQL course was a god send and truly made for beginners. There were lots of practice problems in a structured environment with great feedback. Since then, I’ve played around in the calculus and computer science courses.

For #100daysofdatascience I’ve committed to running through the entire AP statistics course and then taking a mock exam at the end. The complete 2012 exam is posted online so that will be my go to. I am about halfway through the course and here are my thoughts so far:

I have used descriptive statistics pretty much everyday for the last decade. I regularly use inferential statistics and more complex descriptive statistics for modeling and machine learning. These tools are fundamental to hypothesis testing. It takes a lot of motivation to force yourself to go through subject matter you already know but the result is that I have a renewed confidence in my stats tool kit and a fresh grasp of the fundamentals. That said, I will warn that the course can be a bit frustrating at times. Occasionally the questions are a bit subjective or very strict on format of submissions. This is also true for the AP exam and doubly true for most online courses so it’s not especially surprising. It does make things go a bit slower, especially for a perfectionist that wants to get every question right!

The Long Road to Becoming a "Unicorn" #100daysofDataScience

Some Assembly Required

Data Science Venn Diagram by Drew Conway.

Data Science Venn Diagram by Drew Conway.

At some point I stumbled across Drew Conway’s Venn diagram of data science. In some versions you will see “data scientist” replaced with “unicorn”. That was my thought looking at this picture. Only a crazy person could or would be an expert in all three…But I’m a glutton for punishment and took this as a personal challenge. I found myself approaching each of these bubbles as you would course work for a major or degree. I built a little rubric in my head with internet courses to take and HackerRank problems to solve.

Fast forward a few years and I’ve realized this unicorn has a few more horns. In the business world, creating data insights and predictions without the ability to communicate those ideas makes you much less effective. It doesn’t discount the importance of the other three areas, it’s just an additional ingredient necessary for success. Stephan Kolassa highlights the need for communication skills in his update to the Conway Venn diagram.

 
Stephan Kolassa’s data scientist

Stephan Kolassa’s data scientist

I still think that Hacking Skills, Math & Statistics Knowledge and Substantive Expertise (shortened to "Programming", "Statistics" and "Business" for legibility) are important... but I think that the role of Communication is important, too. All the insights you derive by leveraging your hacking, stats and business expertise won't make a bit of a difference unless you can communicate them to people who may not have that unique blend of knowledge. —Stephan Kolassa

 
Disassembled Unicorn from frugal fun for boys and girls.

Disassembled Unicorn from frugal fun for boys and girls.

Luckily, communication of highly technical subject matter was covered exhaustively during my PhD. That training helped immensely but it means each project has another layer. It also expands the list of ‘topics to master’…my data science “course rubric” is growing. There are so many new skills to learn and so many little boxes on Kolassa’s diagram to color in!

So, I’ve decided to dedicate some time to data science topics for the next few months (#100daysofdatascience for #unicornassembly). I will post here about my successes (and failures) as I go!