Regular Expressions & File I/O
Learn to find patterns in text with regular expressions and automate reading/writing files — skills you'll use constantly in data processing.
Resources
Regular Expressions — Pattern Matching
Imagine you have a spreadsheet with 1,000 client entries. Some phone numbers are formatted as "+995-577-724-445", others as "577724445", others as "(577) 724 445". How do you find and standardize all of them?
Regular expressions (regex) let you search for patterns in text, not exact matches.
import re
text = "Call me at 577-724-445 or 555-123-456"
# Find all phone-number patterns
phones = re.findall(r'\d{3}-\d{3}-\d{3}', text)
print(phones) # → ['577-724-445', '555-123-456']
Core Regex Patterns
| Pattern | Meaning | Example Match |
|---------|---------|---------------|
| \d | Any digit | 0-9 |
| \w | Any word character | a-z, A-Z, 0-9, _ |
| \s | Any whitespace | space, tab, newline |
| . | Any character | anything |
| {3} | Exactly 3 of previous | \d{3} matches "577" |
| {2,4} | 2 to 4 of previous | \d{2,4} matches "57" or "577" |
| + | One or more | \d+ matches "577724445" |
| * | Zero or more | \w* matches "" or "hello" |
| ? | Zero or one | -? matches "-" or nothing |
| ^ | Start of string | ^Hello |
| $ | End of string | world$ |
Practical Examples
import re
# Validate email
email = "lia@gecbusiness.com"
if re.match(r'^[\w.-]+@[\w.-]+\.\w+$', email):
print("Valid email")
# Extract all prices from text
text = "Products: $19.99, $24.50, and $7.00"
prices = re.findall(r'\$([\d.]+)', text)
print(prices) # → ['19.99', '24.50', '7.00']
# Replace multiple spaces with single space
messy = "Hello World How Are You"
clean = re.sub(r'\s+', ' ', messy)
print(clean) # → "Hello World How Are You"
# Find Georgian phone numbers (various formats)
contacts = "+995577724445, 595-11-22-33, (599) 123 456"
phones = re.findall(r'\+?995?[\s-]?\(?\d{3}\)?[\s-]?\d{2,3}[\s-]?\d{2,3}[\s-]?\d{2,3}', contacts)
Groups — Extracting Parts of a Match
text = "Order #12345 from Lamaria Honey for $50,000"
match = re.search(r'Order #(\d+) from (.+?) for \$(.*)', text)
if match:
order_id = match.group(1) # → "12345"
client = match.group(2) # → "Lamaria Honey"
amount = match.group(3) # → "50,000"
Parentheses () create groups that capture specific parts of the match.
File I/O — Reading and Writing Files
Automating file operations is one of Python's superpowers. Think of all the CSV exports, client lists, and reports you work with.
Reading Files
# Read entire file
with open("clients.txt", "r") as f:
content = f.read()
print(content)
# Read line by line
with open("clients.txt", "r") as f:
for line in f:
print(line.strip()) # .strip() removes newline
The with statement automatically closes the file when done. Always use it.
Writing Files
# Write (creates new file or overwrites)
with open("report.txt", "w") as f:
f.write("Export Report\n")
f.write("=" * 30 + "\n")
f.write("Total deals: 5\n")
# Append (adds to end of existing file)
with open("log.txt", "a") as f:
f.write("2026-04-20: New deal closed\n")
Working with CSV Files
CSV (Comma-Separated Values) is the most common data format in business:
import csv
# Reading CSV
with open("clients.csv", "r") as f:
reader = csv.DictReader(f)
for row in reader:
print(f"{row['name']}: ${row['deal_value']}")
# Writing CSV
clients = [
{"name": "Lamaria Honey", "country": "France", "value": 50000},
{"name": "Sadili LLC", "country": "Germany", "value": 75000},
]
with open("export_report.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["name", "country", "value"])
writer.writeheader()
writer.writerows(clients)
Combining Regex and Files
import re
# Extract all email addresses from a file
with open("contacts.txt", "r") as f:
text = f.read()
emails = re.findall(r'[\w.-]+@[\w.-]+\.\w+', text)
print(f"Found {len(emails)} emails:")
for email in emails:
print(f" {email}")
This kind of automation is exactly what "Automate the Boring Stuff" is about. Instead of manually searching through 50 files for client emails, a 5-line Python script does it in seconds.
Real-World Application
Think about your GEC work:
- Client data cleanup: Regex to standardize phone numbers, find malformed emails
- Report generation: Read raw data from CSV, process it, write formatted reports
- Log analysis: Search through export documentation for specific patterns
- Data migration: Read old format, transform, write new format
These are not theoretical skills. This is what you'll use Python for professionally.
What to Do This Week
- Read Chapters 7 and 8 of Automate the Boring Stuff
- Visit Regex101 (link above) and practice building patterns
- Practical exercise: Create a CSV file with 5-10 of your export clients (name, country, deal_value, email). Write a Python script that reads it, finds all deals over $30,000, and writes a filtered report to a new file.
- Take the quiz below.
Quiz
What does the regex pattern `\d{3}` match?