Web Scraping

Web scraping with Python has become my latest passion. Take, for instance, the challenge of keeping up with my daughter’s Little League schedule. The league’s way of managing game calendars is over-complicated and requires a login to view. My wife has struggled to stay updated due to their clunky and slow web calendar, as well as not having the time to copy & paste into our family calendar. Is it really so hard to offer a public .ics or even a Google Calendar?

So, rather than wrestling with their unique solution every day, I wrote a web-scraping script so the computer can do the work. Setting up this kind of thing to automatically grab info for me is super handy, which I briefly talked about here.

import asyncio
from playwright.async_api import async_playwright
import json
from urllib import request, parse
import datetime

today_code = datetime.datetime.now().strftime('%a %b %d')  # match the date format on the site "Thu Apr 11"

def send_message_to_slack(text):
    post = {"text": "{0}".format(text)}

    try:
	    json_data = json.dumps(post)
	    req = request.Request("https://hooks.slack.com/services/API_KEY",data=json_data.encode('ascii'),headers={'Content-Type': 'application/json'}) 
	    resp = request.urlopen(req)
    except Exception as em:
	    print("EXCEPTION: " + str(em))

async def scrape_event_cards():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        // removed for privacy //
        await page.get_by_role("button", name="Sign in").click()
        await page.wait_for_selector('.event-card')
        event_cards = await page.query_selector_all('.event-card')

        def clean_text(text):
            lines = [line.strip() for line in text.split('\n') if line.strip()]
            return ' '.join(lines)

        rina_baseball_message = ""
        for card in event_cards:
            text_content = await card.text_content()
            cleaned_text = clean_text(text_content)
            
            # Check if cleaned_text starts with today's day code
            if cleaned_text.startswith(today_code):
                rina_baseball_message += "\n>" + cleaned_text

        if rina_baseball_message != "": 
            send_message_to_slack("⚾ RinaBot 🧢\nToday's Schedule: \n" + rina_baseball_message.strip())

        await browser.close()

Setting up bots like this are great, because it allows me to look at Slack to know what’s important for the day.

Rina Ball