Priyanga D. Talagala
IASSL Workshop on Data visualization with R and Python
27/5/2023
palmerpenguins
python package you can easily load the Palmer penguins into your python environment.# import sys
# !{sys.executable} -m pip install palmerpenguins
from palmerpenguins import load_penguins
penguins = load_penguins()
penguins.head()
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year | |
---|---|---|---|---|---|---|---|---|
0 | Adelie | Torgersen | 39.1 | 18.7 | 181.0 | 3750.0 | male | 2007 |
1 | Adelie | Torgersen | 39.5 | 17.4 | 186.0 | 3800.0 | female | 2007 |
2 | Adelie | Torgersen | 40.3 | 18.0 | 195.0 | 3250.0 | female | 2007 |
3 | Adelie | Torgersen | NaN | NaN | NaN | NaN | NaN | 2007 |
4 | Adelie | Torgersen | 36.7 | 19.3 | 193.0 | 3450.0 | female | 2007 |
There are several popular plotting packages available in Python.
Here are some of them:
## This code will generate an error message
ggplot(penguins, aes(x='species')) +
geom_bar(fill='steelblue') +
labs(x='Species', y='Count', title='Number of Penguins by Species')
Cell In[81], line 2 ggplot(penguins, aes(x='species')) + ^ SyntaxError: invalid syntax
# import sys
# !{sys.executable} -m pip install plotnine
from plotnine import *
ggplot(penguins, aes(x='species')) + geom_bar(fill='steelblue') + labs(x='Species', y='Count', title='Number of Penguins by Species')
<Figure Size: (640 x 480)>
ggplot(penguins, aes(x='species')) + \
geom_bar(fill='steelblue') + \
labs(x='Species', y='Count', title='Number of Penguins by Species')
<Figure Size: (640 x 480)>
import matplotlib.pyplot as plt
species_counts = penguins["species"].value_counts()
plt.bar(species_counts.index, species_counts.values)
plt.xlabel("Species")
plt.ylabel("Count")
plt.title("Number of Penguins by Species")
plt.show()
Note :
plt.show()
to display the plot.plt.show()
to ensure that the plot is displayed correctly.plt.show()
at the end to make sure the plot is shown.## Method 1 using countplot
import seaborn as sns
ax = sns.countplot(data=penguins, x="species")
ax.set_title("Number of Penguins by Species")
plt.show()
sns.countplot(data=penguins, x="species", color="steelblue")
plt.xlabel("Species")
plt.ylabel("Count")
plt.title("Number of Penguins by Species")
plt.show()
## Using bar plot
species_counts = penguins["species"].value_counts()
species_counts
sns.barplot(x=species_counts.index, y=species_counts.values, color="steelblue")
plt.xlabel("Species")
plt.ylabel("Count")
plt.title("Number of Penguins by Species")
plt.show()
The popularity of a library is influenced by several factors, including its functionality, ease of use, community support, and historical adoption.
While ggplot is a popular and highly regarded library in the R programming language, its adoption in Python has been relatively limited.
There are a few reasons for this:
Familiarity:
Compatibility:
Active Community and Documentation:
Seaborn divides its plots into the following categories based on the types of relationships between variables:
Relational Plots: These plots are used to visualize the relationship between two numeric variables. Some examples include scatter plots (scatterplot()), line plots (lineplot()), and joint plots (jointplot()).
Categorical Plots: These plots are used to show the relationship between one numeric variable and one categorical variable. They can help visualize distributions, comparisons, and aggregations across categories. Some examples include bar plots (barplot()), count plots (countplot()), and box plots (boxplot()).
Distribution Plots: These plots are used to visualize the distribution of a single variable or the relationship between multiple variables. They help in understanding the underlying distribution and identifying patterns. Some examples include histograms (histplot()), kernel density estimation plots (kdeplot()), and violin plots (violinplot()).
Regression Plots: These plots are used to visualize the relationship between two numeric variables and fit regression models to the data. They help in understanding the linear or non-linear relationship between variables. Some examples include scatter plots with regression lines (regplot()), residual plots (residplot()), and regression joint plots (jointplot() with regression).
Matrix Plots: These plots are used to display the relationships between multiple variables as a matrix. They are particularly useful for visualizing correlation matrices and covariance matrices. Some examples include heatmap plots (heatmap()) and clustermap plots (clustermap()).
These categories provide a comprehensive set of plotting options in Seaborn to address various types of relationships and data structures.
For more plotting options visit the Python Graph Gallery
Prepare some data
Control figure aesthetics
Plot with Seaborn
Further customize your plot
Show your plot
There are two ways to create a plot using seaborn
data=
argument, while passing column names to the axes arguments,x=
and y=
.Here's an example of creating a bar plot using the two different methods in Seaborn:
# Method 1: Pass DataFrame to data= argument and column name to x=
sns.countplot(data = penguins, x = 'species')
plt.show()
# Method 2: Pass Series of data to x= argument
sns.countplot(x = penguins['species'])
plt.show()
# Create a scatter plot using Seaborn
sns.scatterplot(data=penguins, x='flipper_length_mm', y='body_mass_g')
plt.show()
# Create a scatter plot with color and shape aesthetics using Seaborn
sns.scatterplot(data=penguins, x='flipper_length_mm', y='body_mass_g', hue='species', style='species')
plt.show()
# Create a scatter plot with conditional coloring using Seaborn
sns.scatterplot(data=penguins, x='flipper_length_mm', y='body_mass_g', hue=penguins['flipper_length_mm'] < 205)
plt.show()
# Create a scatter plot with purple color using Seaborn
sns.scatterplot(data=penguins, x='flipper_length_mm', y='body_mass_g', color='purple')
plt.show()
# Create a scatter plot with color and shape based on species using Seaborn
sns.scatterplot(data=penguins, x='flipper_length_mm', y='body_mass_g', hue='species', style='species')
# Add a 2D density plot
sns.kdeplot(data=penguins, x='flipper_length_mm', y='body_mass_g', color='black', fill=True, alpha=0.3)
# Show the plot
plt.show()
sns.scatterplot(
data=penguins, # Specify the DataFrame
x='flipper_length_mm', # Specify the x-axis variable
y='body_mass_g', # Specify the y-axis variable
hue='species', # Color the points based on species
style='island' # Use different point shapes based on island
)
plt.show()
# Define the color palette
cols = {"Adelie": "red", "Chinstrap": "blue", "Gentoo": "darkgreen"}
# Create a scatter plot
sns.scatterplot(
data=penguins, # Specify the DataFrame
x='flipper_length_mm', # Specify the x-axis variable
y='body_mass_g', # Specify the y-axis variable
hue='species', # Color the points based on species
palette=cols # Use the defined color palette
)
# Display the plot
plt.show()
sns.scatterplot(
data=penguins,
x='flipper_length_mm',
y='body_mass_g',
hue='bill_length_mm',
style='island'
)
# Add any additional customization or annotations here
plt.show()
# Create a scatter plot using seaborn
sns.scatterplot(
data=penguins, # Specify the data
x='flipper_length_mm', # Set the x-axis variable
y='body_mass_g', # Set the y-axis variable
hue='species', # Set the variable for color
palette='Dark2' # Set the color palette to 'Dark2'
)
plt.show() # Display the plot
import brewer2mpl
# Print all available color palettes
brewer2mpl.print_maps()
Sequential Blues : {3, 4, 5, 6, 7, 8, 9} BuGn : {3, 4, 5, 6, 7, 8, 9} BuPu : {3, 4, 5, 6, 7, 8, 9} GnBu : {3, 4, 5, 6, 7, 8, 9} Greens : {3, 4, 5, 6, 7, 8, 9} Greys : {3, 4, 5, 6, 7, 8, 9} OrRd : {3, 4, 5, 6, 7, 8, 9} Oranges : {3, 4, 5, 6, 7, 8, 9} PuBu : {3, 4, 5, 6, 7, 8, 9} PuBuGn : {3, 4, 5, 6, 7, 8, 9} PuRd : {3, 4, 5, 6, 7, 8, 9} Purples : {3, 4, 5, 6, 7, 8, 9} RdPu : {3, 4, 5, 6, 7, 8, 9} Reds : {3, 4, 5, 6, 7, 8, 9} YlGn : {3, 4, 5, 6, 7, 8, 9} YlGnBu : {3, 4, 5, 6, 7, 8, 9} YlOrBr : {3, 4, 5, 6, 7, 8, 9} YlOrRd : {3, 4, 5, 6, 7, 8, 9} Diverging BrBG : {3, 4, 5, 6, 7, 8, 9, 10, 11} PRGn : {3, 4, 5, 6, 7, 8, 9, 10, 11} PiYG : {3, 4, 5, 6, 7, 8, 9, 10, 11} PuOr : {3, 4, 5, 6, 7, 8, 9, 10, 11} RdBu : {3, 4, 5, 6, 7, 8, 9, 10, 11} RdGy : {3, 4, 5, 6, 7, 8, 9, 10, 11} RdYlBu : {3, 4, 5, 6, 7, 8, 9, 10, 11} RdYlGn : {3, 4, 5, 6, 7, 8, 9, 10, 11} Spectral : {3, 4, 5, 6, 7, 8, 9, 10, 11} Qualitative Accent : {3, 4, 5, 6, 7, 8} Dark2 : {3, 4, 5, 6, 7, 8} Paired : {3, 4, 5, 6, 7, 8, 9, 10, 11, 12} Pastel1 : {3, 4, 5, 6, 7, 8, 9} Pastel2 : {3, 4, 5, 6, 7, 8} Set1 : {3, 4, 5, 6, 7, 8, 9} Set2 : {3, 4, 5, 6, 7, 8} Set3 : {3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
# Create scatter plot using seaborn and viridis
sns.scatterplot(
data=penguins, # Data frame containing the data
x='flipper_length_mm', # Variable for the x-axis
y='body_mass_g', # Variable for the y-axis
hue='species', # Variable to differentiate the colors
palette='viridis' # Color map to use for the plot
)
# Display the plot
plt.show()
# Create scatter plot using Seaborn
sns.scatterplot(
data=penguins,
x='flipper_length_mm',
y='body_mass_g',
hue='species',
style='species',
palette='plasma',
markers={'Adelie': 's', 'Gentoo': 'o', 'Chinstrap': '^'},
)
# Set x-axis breaks
plt.xticks([170, 200, 230])
# Set y-axis to logarithmic scale
plt.yscale('log')
# Display the plot
plt.show()
# Create facets for each species
g = sns.FacetGrid(penguins, col='species')
g.map(sns.scatterplot, 'flipper_length_mm', 'body_mass_g')
# Display the plot
plt.show()
# Create scatter plot using seaborn and facet_wrap
# The col_wrap parameter in the FacetGrid function specifies the maximum number of columns in the grid of facets
g = sns.FacetGrid(penguins, col='species', col_wrap=3, sharex=False, sharey=False)
g.map(sns.scatterplot, 'flipper_length_mm', 'body_mass_g')
# Display the plot
plt.show()
g = sns.FacetGrid(penguins, col='species', row='sex')
g.map(sns.scatterplot, 'flipper_length_mm', 'body_mass_g')
# Display the plot
plt.show()
sns.countplot(data=penguins, x='species')
plt.show()
# Cord flip
sns.countplot(data=penguins, y='species', palette='Set1')
plt.show()
These are complete themes which control all non-data display.
Seaborn provides several built-in themes that you can use to customize the appearance of your plots.
They control all non-data display.
Here are some of the available themes:
"darkgrid": Dark background with gridlines.
"whitegrid": White background with gridlines.
"dark": Dark background without gridlines.
"white": White background without gridlines.
"ticks": White background with tick marks.
# Set the theme
sns.set_theme(style="darkgrid")
# Create the scatter plot
sns.scatterplot(data=penguins, x='flipper_length_mm', y='body_mass_g', hue='species', style='species')
# Display the plot
plt.show()
# Set the theme
sns.set_theme(style="whitegrid")
# Create the scatter plot
sns.scatterplot(data=penguins, x='flipper_length_mm', y='body_mass_g', hue='species', style='species')
# Display the plot
plt.show()
# Set the theme
sns.set_theme(style="dark")
# Create the scatter plot
sns.scatterplot(data=penguins, x='flipper_length_mm', y='body_mass_g', hue='species', style='species')
# Display the plot
plt.show()
# Create the figure and subplots
fig, axs = plt.subplots(2, 2, figsize=(10, 8))
# Scatter plot
sns.scatterplot(data=penguins, x='flipper_length_mm', y='body_mass_g', hue='species', ax=axs[0, 0])
# Box plot
sns.boxplot(data=penguins, x='species', y='bill_depth_mm', ax=axs[0, 1])
# Histogram with KDE
sns.histplot(data=penguins, x='body_mass_g', hue='species', element='step', kde=True, ax=axs[1, 0])
# Remove empty subplot
fig.delaxes(axs[1, 1])
# Adjust the layout
plt.tight_layout()
# Add a plot title for the entire panel
fig.suptitle("Penguin Data Analysis", fontsize=16, y=1)
# Show the plot
plt.show()
# Create the figure and grid layout
fig = plt.figure(figsize=(10, 8)) # Create a new figure object with size 10x8 inches
gs = fig.add_gridspec(2, 2) # Add a grid specification with 2 rows and 2 columns
# Scatter plot
ax1 = fig.add_subplot(gs[0, 0])
sns.scatterplot(data=penguins, x='flipper_length_mm', y='body_mass_g', hue='species', ax=ax1)
ax1.text(0.05, 0.9, 'A', transform=ax1.transAxes, fontsize=16, fontweight='bold')
# Modify the legend title and position
ax1.legend_.set_title("Penguin Species")
ax1.legend_.set_bbox_to_anchor((1.1, 1.0)) # Adjust the bbox_to_anchor value to change the legend position
# Box plot
ax2 = fig.add_subplot(gs[0, 1])
sns.boxplot(data=penguins, x='species', y='bill_depth_mm', ax=ax2)
ax2.text(0.05, 0.9, 'B', transform=ax2.transAxes, fontsize=16, fontweight='bold')
# Histogram with KDE
ax3 = fig.add_subplot(gs[1, :]) # The colon to indicate that I want to select all columns in this row
sns.histplot(data=penguins, x='body_mass_g', hue='species', element='step', kde=True, ax=ax3)
ax3.text(0.05, 0.9, 'C', transform=ax3.transAxes, fontsize=16, fontweight='bold')
# Adjust the layout
fig.tight_layout()
# Add the plot annotation and adjust the position
fig.suptitle('Size measurements for adult foraging penguins near Palmer Station, Antarctica', fontsize=16, y=1)
#fig.text(0.05, 0.05, 'A', fontsize=16, fontweight='bold')
# Show the plot
plt.show()
import plotly.express as px
# Convert seaborn plot to interactive plotly plot
scatterplot_plotly = px.scatter(penguins, x='flipper_length_mm', y='body_mass_g', color='species')
# Display the interactive plot
scatterplot_plotly.show()
# Create a scatter plot using Plotly Express
fig = px.scatter(
penguins, # Dataset
x="flipper_length_mm", # X-axis variable
y="body_mass_g", # Y-axis variable
color="species", # Variable used for color differentiation
marginal_y="violin", # Add a violin plot on the y-axis margin
marginal_x="box", # Add a box plot on the x-axis margin
trendline="ols", # Add an ordinary least squares (OLS) trendline
template="simple_white" # Use the "simple_white" plot template
)
# Customize the layout of the plot
fig.update_layout(
xaxis_title="Flipper Length (mm)", # Set the x-axis title
yaxis_title="Body Mass (g)" # Set the y-axis title
)
# Display the plot
fig.show()