Introduction to File Formats (CSV, JSON, and Excel)
Overview
Data often comes in structured file formats. You'll practice reading and writing CSV, JSON, and Excel files with Python libraries like csv and pandas to prepare for real-world data handling, working with common data sources.
What You Will Learn in This Lesson
By the end of this lesson, you will know:
- CSV files: Comma-separated values format for tabular data.
- JSON files: Structured data format for APIs and configs.
- Excel files: Spreadsheet format with formatting.
- Reading files: How to read data from each format.
- Writing files: How to write data to each format.
Common File Formats
Different file formats serve different purposes:
CSV
Simple text format, comma-separated values, good for spreadsheets and data exchange
JSON
Structured text format, hierarchical data, commonly used in APIs and configs
Excel
Binary format, complex formatting, good for rich spreadsheets with formulas
Working with CSV Files
CSV (Comma-Separated Values) files are simple text files that store tabular data. Python's built-in csv module makes working with CSV files easy:
import csv
# Reading CSV as lists
with open('data.csv', 'r') as file:
reader = csv.reader(file)
header = next(reader) # Read header row
print(f"Columns: {header}")
for row in reader:
print(row) # Each row is a list
import csv
# Reading CSV as dictionaries (easier to work with)
with open('data.csv', 'r') as file:
reader = csv.DictReader(file)
for row in reader:
print(f"Name: {row['name']}, Age: {row['age']}")
# Access columns by name instead of index
import csv
# Writing CSV
data = [
['Name', 'Age', 'City'],
['Alice', 25, 'New York'],
['Bob', 30, 'Los Angeles'],
['Charlie', 35, 'Chicago']
]
with open('output.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerows(data) # Write all rows at once
# Or write row by row
with open('output2.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(['Name', 'Age']) # Header
writer.writerow(['Alice', 25])
writer.writerow(['Bob', 30])
CSV File Format
A CSV file looks like this:
Name,Age,City
Alice,25,New York
Bob,30,Los Angeles
Charlie,35,Chicago
Each line is a row, and values are separated by commas. The first line is usually the header.
Working with JSON Files
JSON files store structured data. We covered JSON parsing earlier, but here's how to work with JSON files:
import json
# Reading JSON file
with open('data.json', 'r') as file:
data = json.load(file) # Loads entire file as Python object
print(data)
# Writing JSON file
data_to_save = {
'users': [
{'name': 'Alice', 'age': 25},
{'name': 'Bob', 'age': 30}
],
'total': 2
}
with open('output.json', 'w') as file:
json.dump(data_to_save, file, indent=2) # indent=2 for pretty formatting
JSON vs CSV
JSON is better for nested/hierarchical data, while CSV is better for simple tabular data. JSON preserves data types (numbers, booleans), while CSV treats everything as text.
Working with Excel Files
For Excel files (.xlsx), use the pandas library, which provides powerful data manipulation capabilities:
import pandas as pd
# Reading Excel file
df = pd.read_excel('data.xlsx')
# Display first few rows
print(df.head())
# Access specific columns
print(df['Name'])
print(df['Age'])
# Filter data
young_people = df[df['Age'] < 30]
print(young_people)
import pandas as pd
# Create DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
# Write to Excel
df.to_excel('output.xlsx', index=False) # index=False removes row numbers
# Write to specific sheet
df.to_excel('output.xlsx', sheet_name='Users', index=False)
Installing Required Libraries
For Excel files, install pandas and openpyxl:
pip install pandas openpyxl
pandas provides DataFrames for data manipulation, and openpyxl handles Excel file reading/writing.
When to Use Each Format
- CSV: Simple tabular data, easy to share, works everywhere
- JSON: Structured/nested data, APIs, configuration files
- Excel: Complex spreadsheets with formatting, formulas, multiple sheets
Practice: Working with File Formats
Try It YourselfTry simulating CSV and JSON operations:
What happened? You simulated working with CSV (tabular) and JSON (structured) data formats. In real programs, you would use csv module for CSV files and json module for JSON files.
Summary
In this lesson, you learned:
- CSV: Simple text format for tabular data
- JSON: Structured format for hierarchical data
- Excel: Binary format for rich spreadsheets
- Reading: Use csv module for CSV, pandas for Excel
- Writing: Use csv module for CSV, pandas for Excel
Remember
Choose the right format for your data. CSV for simple tables, JSON for structured data exchange, Excel for complex spreadsheets with formatting.
End-of-Lesson Exercises
Think about these questions to reinforce what you've learned:
Exercise 1: File Formats
What are the differences between CSV, JSON, and Excel files? When would you use each?
Exercise 2: Working with Files
How do you read and write CSV files? How about Excel files?