Unit 9 • Lesson 9

Introduction to File Formats (CSV, JSON, and Excel)

Overview

Data often comes in structured file formats. You'll practice reading and writing CSV, JSON, and Excel files with Python libraries like csv and pandas to prepare for real-world data handling, working with common data sources.

Intermediate 30–40 min

What You Will Learn in This Lesson

By the end of this lesson, you will know:

  • CSV files: Comma-separated values format for tabular data.
  • JSON files: Structured data format for APIs and configs.
  • Excel files: Spreadsheet format with formatting.
  • Reading files: How to read data from each format.
  • Writing files: How to write data to each format.

Common File Formats

Different file formats serve different purposes:

CSV

Simple text format, comma-separated values, good for spreadsheets and data exchange

JSON

Structured text format, hierarchical data, commonly used in APIs and configs

Excel

Binary format, complex formatting, good for rich spreadsheets with formulas

Working with CSV Files

CSV (Comma-Separated Values) files are simple text files that store tabular data. Python's built-in csv module makes working with CSV files easy:

Reading CSV with csv.reader()
import csv

# Reading CSV as lists
with open('data.csv', 'r') as file:
    reader = csv.reader(file)
    header = next(reader)  # Read header row
    print(f"Columns: {header}")
    
    for row in reader:
        print(row)  # Each row is a list
Reading CSV with csv.DictReader()
import csv

# Reading CSV as dictionaries (easier to work with)
with open('data.csv', 'r') as file:
    reader = csv.DictReader(file)
    for row in reader:
        print(f"Name: {row['name']}, Age: {row['age']}")
        # Access columns by name instead of index
Writing CSV Files
import csv

# Writing CSV
data = [
    ['Name', 'Age', 'City'],
    ['Alice', 25, 'New York'],
    ['Bob', 30, 'Los Angeles'],
    ['Charlie', 35, 'Chicago']
]

with open('output.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(data)  # Write all rows at once

# Or write row by row
with open('output2.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(['Name', 'Age'])  # Header
    writer.writerow(['Alice', 25])
    writer.writerow(['Bob', 30])

CSV File Format

A CSV file looks like this:

Name,Age,City
Alice,25,New York
Bob,30,Los Angeles
Charlie,35,Chicago

Each line is a row, and values are separated by commas. The first line is usually the header.

Working with JSON Files

JSON files store structured data. We covered JSON parsing earlier, but here's how to work with JSON files:

Reading and Writing JSON Files
import json

# Reading JSON file
with open('data.json', 'r') as file:
    data = json.load(file)  # Loads entire file as Python object
    print(data)

# Writing JSON file
data_to_save = {
    'users': [
        {'name': 'Alice', 'age': 25},
        {'name': 'Bob', 'age': 30}
    ],
    'total': 2
}

with open('output.json', 'w') as file:
    json.dump(data_to_save, file, indent=2)  # indent=2 for pretty formatting

JSON vs CSV

JSON is better for nested/hierarchical data, while CSV is better for simple tabular data. JSON preserves data types (numbers, booleans), while CSV treats everything as text.

Working with Excel Files

For Excel files (.xlsx), use the pandas library, which provides powerful data manipulation capabilities:

Reading Excel Files
import pandas as pd

# Reading Excel file
df = pd.read_excel('data.xlsx')

# Display first few rows
print(df.head())

# Access specific columns
print(df['Name'])
print(df['Age'])

# Filter data
young_people = df[df['Age'] < 30]
print(young_people)
Writing Excel Files
import pandas as pd

# Create DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# Write to Excel
df.to_excel('output.xlsx', index=False)  # index=False removes row numbers

# Write to specific sheet
df.to_excel('output.xlsx', sheet_name='Users', index=False)

Installing Required Libraries

For Excel files, install pandas and openpyxl:

pip install pandas openpyxl

pandas provides DataFrames for data manipulation, and openpyxl handles Excel file reading/writing.

When to Use Each Format

  • CSV: Simple tabular data, easy to share, works everywhere
  • JSON: Structured/nested data, APIs, configuration files
  • Excel: Complex spreadsheets with formatting, formulas, multiple sheets

Practice: Working with File Formats

Try It Yourself

Try simulating CSV and JSON operations:

Press Run to see output

What happened? You simulated working with CSV (tabular) and JSON (structured) data formats. In real programs, you would use csv module for CSV files and json module for JSON files.

Summary

In this lesson, you learned:

  • CSV: Simple text format for tabular data
  • JSON: Structured format for hierarchical data
  • Excel: Binary format for rich spreadsheets
  • Reading: Use csv module for CSV, pandas for Excel
  • Writing: Use csv module for CSV, pandas for Excel

Remember

Choose the right format for your data. CSV for simple tables, JSON for structured data exchange, Excel for complex spreadsheets with formatting.

End-of-Lesson Exercises

Think about these questions to reinforce what you've learned:

Exercise 1: File Formats

What are the differences between CSV, JSON, and Excel files? When would you use each?

Exercise 2: Working with Files

How do you read and write CSV files? How about Excel files?