Working with Python libraries¶

Dr. Priynaga D. Talagala

IASSL Workshop on Data visualization with R and

25-5-2023

Step 1: Install a pip package in the current Jupyter kernel¶

To install Python libraries, we use pip command on the command line console of the Operating System. In Jupyter, the console commands can be executed by the ‘!’ sign before the command within the cell. It is recommended to use sys library in Python which will return the path of the current version’s pip on which the jupyter is running.

Syntax:

import sys

!{sys.executable} -m pip install [package_name]

By the above code, the package will be installed in the same Python version on which the jupyter notebook is running.

In [60]:
# This is a comment
# import sys
# !{sys.executable} -m pip install pandas
# !{sys.executable} -m pip install palmerpenguins
# !{sys.executable} -m pip install matplotlib
# !{sys.executable} -m pip install seaborn

Step 2: Importing libraries and dataset¶

Once a library is installed, import it in to your application by adding the import module statement

  1. import [package_name]

    Typying package_name.foo in your code can be tedious. Tedium can be minimized by using import [package_name] as [pkg] then typing pkg.foo.

Or

  1. from [package_name] import [foo]

    Here to use another item from the module, you have to update your import statement.

Let's start by importing Pandas, which is a great library for managing relational (i.e. table-format) datasets:

In [61]:
# Pandas for managing datasets
# By convention, it is imported with the shorthand pd.
import pandas as pd

Next, we'll import Matplotlib, which will help us customize our plots further. It provides simple codes to visualize complex statistical plots, which also happen to be aesthetically pleasing. Further, Seaborn was built on top of Matplotlib, meaning it can be further powered up with Matplotlib functionalities.

In [62]:
# Matplotlib for additional customization
# Matplotlib is the whole package; pylab is a module in matplotlib.
# By convention, it is imported with the shorthand plt.
from matplotlib import pyplot as plt

# Alternative way
# import matplotlib.pyplot as plt 
# It is generally customary to use `import matplotlib.pyplot as plt` 
# and suggested in the matplotlib documentation.

Then, we'll import the Seaborn library. Seaborn is a data visualization library built on top of matplotlib and closely integrated with pandas data structures in Python.

In [63]:
# Seaborn for plotting and styling
# By convention, it is imported with the shorthand sns.
import seaborn as sns

Tip: we gave each of our imported libraries an alias. Later, we can invoke Pandas with pd, Matplotlib with plt, and Seaborn with sns.

Now we're ready to import our dataset. You can import your CSV file using Pandas.

In [64]:
# Import dataset
df = pd.read_csv("data/titanic.csv")

Here's what the dataset looks like:

In [65]:
# Display first 5 observations
df.head()
Out[65]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
In [67]:
# summary statistics
df.Age.describe()
Out[67]:
count    714.000000
mean      29.699118
std       14.526497
min        0.420000
25%       20.125000
50%       28.000000
75%       38.000000
max       80.000000
Name: Age, dtype: float64
In [ ]:
# Tip : 
# to get the help page
# help(df.describe)

The dot (.) in Python¶

  • In Python, the dot (.) is primarily used as a separator to access attributes and methods of objects.

  • This syntax is known as dot notation or dot operator.

  • When you have an object, such as a variable or an instance of a class, you can use the dot notation to access its attributes and methods.

  • For example, if you have a DataFrame named my_df, you can use dot notation to access various methods and attributes associated with DataFrames.

  • For instance, you can use

    • my_df.head() to retrieve the first few rows of the DataFrame
    • my_df.shape to obtain the dimensions of the DataFrame
    • my_df.columns to access the column names
  • The dot operator allows you to specify the DataFrame on which you want to perform actions or retrieve values, enabling you to manipulate and analyze the data effectively.

Workign with packages

  • Similarly, when working with modules or packages, you use the dot notation to access the objects defined within them.
  • For instance, if you have a module named math and you want to use the sqrt function from that module, you would write math.sqrt().
  • Overall, the dot (.) in Python is a crucial component of the language's syntax for accessing attributes, methods, and objects within modules, classes, and instances.
  • It helps organize and navigate the structure of Python programs and allows you to work with the properties and behavior associated with different objects.

NOtE:

  • By assigning the result of pd.read_csv("data/titanic.csv") to the variable df, you create a DataFrame object that holds the data from the CSV file.
  • This DataFrame object, df, becomes the gateway to accessing various pandas capabilities for data analysis and manipulation.