How the different teams have performed and what type of shots does the well performing teams take?

OTVgroup
Apr 7, 2020
3 min read

The standings of different seasons

After exploring the design space and getting to know the data, we started to dive deeper in the data to find answers to our research questions. Before being able to answer more complicated questions about the dynamics of a good basketball team, we needed to figure out how the different teams had been performing over the analysis period. For this purpose, we already had made a sketch in one of our design space exploration sessions, and decided to build a graphic based on that. The idea of the sketch was to visualize the standings of each season in a league wide table, so that we are able to understand how well have each team been positioned during the regular seasons 2014-2018.

In order to make the visual, we needed to aggregate the dataset about the results of the games so that we can recreate the standings of each NBA regular season. For this, a following R-script was used.

library(dplyr)
data <- fromJSON("https://gist.githubusercontent.com/kangaroo2020/7afc204f5d1274051aa12dd54e8f3f9b/raw/83ac178f83235ca25ebae8f680f3bc7fb2fb7c95/nba_data2.json")
cat_vars <- distinct(select(data, c("Team", "Division", "Conference")))
data2 <- mutate(data, win_percentage = unclass(as.factor(WINorLOSS))-1)
data2 <- select(data2, -WINorLOSS)

data_agg2 <- data2 
                %>% group_by(Team, season) 
                %>% summarize_all(mean) 
                %>% arrange(desc(win_percentage)) 
                %>% arrange(season)
                
data_agg2 <- data_agg2 %>% select_if(~ !any(is.na(.)))
data_agg2 <- left_join(data_agg2, cat_vars, by = "Team")

data_agg2$position <- rep(seq(1,30), 4)

write.csv(data_agg2, file = "season_team_avg.csv", quote = FALSE, col.names = FALSE)

Due to the large number of columns, the function summarize_all() was used, even though that produces NA values for categorical variables. However, it was easier to first extract the team specific categorical variables we wanted to preserve, then perform the aggregation, drop the NA-valued categorical variables and then finally to reattach the team specific categorical variables to the aggregated dataframe. The dataframe was sorted based on the seasons and win percentages to recreate the standings. Based on this dataframe, a following visual was created using Vega-Lite.

On a first glimpse the visual might look a bit confusing, but it gives a good overview on the performance of the teams. Hovering mouse over a line of the team highlights it and allows more in-depth analysis of the positions over the seasons. From the visual, it is easy to see that teams like Golden State Warriors (GSW), Houston Rockets (HOU), Cleveland Cavaliers (CLE) and San Antonio Spurs (SAS) have been performing rather well over the seasons and teams like Sacramento Kings (SAC), Los Angeles Lakers (LAL), New York Knicks (NYK) and Phoenix Suns (PHO) have been performing below the average. This information is crucial in further analysing the different statistics possibly contributing in the success of a basketball team.

The effect of 2 and 3 point shots

First statistic analyzed was the division between 2-point shots and 3-point shots taken by the teams. A visual for analyzing this was drafted in the design space exploration session where we combined different initial sketches, and is presented below.

For this visual, the season-team specific aggregated dataframe was further aggregated to contain average statistics for each team over all of the seasons with the following R-code.

data_agg3 <- data_agg2 %>% group_by(Team) %>% summarize_all(mean)
data_agg3 <- data_agg3 %>% select_if(~ !any(is.na(.)))
data_agg3 <- left_join(data_agg3, cat_vars, by = "Team")

The data was then again visualized using Vega-Lite, with the modification that the hue was determined to present the succession rate for field goal attempts instead of number of fouls.

The larger size marks present better performing teams, whereas the shade of the mark represents the percentage of the successful field goal attempts (Field goal = 2 point shots + 3 point shots). It can easily be seen that the larger marks are more concentrated on the lower right corner of the chart, indicating that better performing teams shoot more 3-point shots than the lower performing teams, with the exception of San Antonio Spurs (SAS), who positions to the upper left corner. In addition it is notable that the Golden State Warriors (GSW) are in the middle territory, but compensate that with the best succession percentage for field goal attempts. As all of the bottom-of-the-table teams position themselves to the upper left corner, it enforces the conclusion that the teams who shoot relatively more 3-pointers, tend to succeed better than the ones who shoot more 2-pointers.

How the different teams have performed and what type of shots does the well performing teams take?

The standings of different seasons

The effect of 2 and 3 point shots

Recent Posts

Opmerkingen