Chapter 15: Data Analysis with pandas & NumPy

Learn how to manipulate numerical arrays with NumPy and tabular data with pandas, including aggregation and basic plotting.

Download chapter15.py

Objectives

Create and manipulate numpy.ndarray objects.
Load and inspect data using pandas.Series and DataFrame.
Perform indexing, slicing, and boolean selection in pandas.
Compute summary statistics and group-by aggregations.
Handle missing data and apply functions with apply.
Visualize data with built-in pandas plotting (matplotlib).

1. NumPy Arrays

import numpy as np

# 1D array
arr = np.array([1, 2, 3, 4])
# 2D array
mat = np.arange(9).reshape(3, 3)
print(arr)
print(mat)

# elementwise operations
print(arr + 10)
# statistics
print("mean:", arr.mean(), "sum:", arr.sum())

2. pandas Series & DataFrame

import pandas as pd

# create Series
s = pd.Series([10, 20, 30], index=['a','b','c'])
# create DataFrame
df = pd.DataFrame({
    'col1': [1,2,3],
    'col2': ['x','y','z']
})
print(s)
print(df.head())

# read CSV
df = pd.read_csv('data.csv')
print(df.info())
print(df.describe())

3. Indexing & Selection

# label-based
print(df.loc[0, 'col1'])
# position-based
print(df.iloc[0:2, 0:2])
# boolean mask
print(df[df['col1'] > 1])

4. Aggregation & GroupBy

# summary statistics
print(df['col1'].mean(), df['col1'].sum())

# group by
grouped = df.groupby('col2')['col1'].agg(['mean','count'])
print(grouped)

5. Missing Data & apply()

# fill missing values
df['col1'] = df['col1'].fillna(0)
# apply function
df['col3'] = df['col1'].apply(lambda x: x**2)

6. Basic Plotting

import matplotlib.pyplot as plt

# line plot
df['col1'].plot(title='Col1 over index')
plt.show()

# bar plot
df.groupby('col2')['col1'].sum().plot(kind='bar')
plt.show()

Exercises

Load the Iris dataset into a DataFrame and compute the average petal length by species.
Identify and drop rows with missing values in a sample CSV.
Plot a histogram of a numeric column and save it to a file.
Use apply to normalize a column (min–max scaling).