Chapter 15: Data Analysis with pandas & NumPy

Learn how to manipulate numerical arrays with NumPy and tabular data with pandas, including aggregation and basic plotting.

Download chapter15.py

Objectives

1. NumPy Arrays

import numpy as np

# 1D array
arr = np.array([1, 2, 3, 4])
# 2D array
mat = np.arange(9).reshape(3, 3)
print(arr)
print(mat)
# elementwise operations
print(arr + 10)
# statistics
print("mean:", arr.mean(), "sum:", arr.sum())

2. pandas Series & DataFrame

import pandas as pd

# create Series
s = pd.Series([10, 20, 30], index=['a','b','c'])
# create DataFrame
df = pd.DataFrame({
    'col1': [1,2,3],
    'col2': ['x','y','z']
})
print(s)
print(df.head())
# read CSV
df = pd.read_csv('data.csv')
print(df.info())
print(df.describe())

3. Indexing & Selection

# label-based
print(df.loc[0, 'col1'])
# position-based
print(df.iloc[0:2, 0:2])
# boolean mask
print(df[df['col1'] > 1])

4. Aggregation & GroupBy

# summary statistics
print(df['col1'].mean(), df['col1'].sum())

# group by
grouped = df.groupby('col2')['col1'].agg(['mean','count'])
print(grouped)

5. Missing Data & apply()

# fill missing values
df['col1'] = df['col1'].fillna(0)
# apply function
df['col3'] = df['col1'].apply(lambda x: x**2)

6. Basic Plotting

import matplotlib.pyplot as plt

# line plot
df['col1'].plot(title='Col1 over index')
plt.show()

# bar plot
df.groupby('col2')['col1'].sum().plot(kind='bar')
plt.show()

Exercises

  1. Load the Iris dataset into a DataFrame and compute the average petal length by species.
  2. Identify and drop rows with missing values in a sample CSV.
  3. Plot a histogram of a numeric column and save it to a file.
  4. Use apply to normalize a column (min–max scaling).