Chapter 15: Data Analysis with pandas & NumPy
Learn how to manipulate numerical arrays with NumPy and tabular data with pandas, including aggregation and basic plotting.
Downloadchapter15.py
Objectives
- Create and manipulate
numpy.ndarray
objects. - Load and inspect data using
pandas.Series
andDataFrame
. - Perform indexing, slicing, and boolean selection in pandas.
- Compute summary statistics and group-by aggregations.
- Handle missing data and apply functions with
apply
. - Visualize data with built-in pandas plotting (matplotlib).
1. NumPy Arrays
import numpy as np
# 1D array
arr = np.array([1, 2, 3, 4])
# 2D array
mat = np.arange(9).reshape(3, 3)
print(arr)
print(mat)
# elementwise operations
print(arr + 10)
# statistics
print("mean:", arr.mean(), "sum:", arr.sum())
2. pandas Series & DataFrame
import pandas as pd
# create Series
s = pd.Series([10, 20, 30], index=['a','b','c'])
# create DataFrame
df = pd.DataFrame({
'col1': [1,2,3],
'col2': ['x','y','z']
})
print(s)
print(df.head())
# read CSV
df = pd.read_csv('data.csv')
print(df.info())
print(df.describe())
3. Indexing & Selection
# label-based
print(df.loc[0, 'col1'])
# position-based
print(df.iloc[0:2, 0:2])
# boolean mask
print(df[df['col1'] > 1])
4. Aggregation & GroupBy
# summary statistics
print(df['col1'].mean(), df['col1'].sum())
# group by
grouped = df.groupby('col2')['col1'].agg(['mean','count'])
print(grouped)
5. Missing Data & apply()
# fill missing values
df['col1'] = df['col1'].fillna(0)
# apply function
df['col3'] = df['col1'].apply(lambda x: x**2)
6. Basic Plotting
import matplotlib.pyplot as plt
# line plot
df['col1'].plot(title='Col1 over index')
plt.show()
# bar plot
df.groupby('col2')['col1'].sum().plot(kind='bar')
plt.show()
Exercises
- Load the Iris dataset into a DataFrame and compute the average petal length by species.
- Identify and drop rows with missing values in a sample CSV.
- Plot a histogram of a numeric column and save it to a file.
- Use
apply
to normalize a column (min–max scaling).