feesta


2024-03-01

Week Notes #1

This week marks a new chapter for me, venturing into the realm of exploration and reflection as I transition from an extended tenure working on smart eyewear products to figuring out what’s next for me. I’m thinking to initially use weeknotes as a way to share and track what I’m thinking about and tinkering on.

1: Decompressing: My first week without a full-time job in 8 years felt like a vacation. I soaked up the Spring sun with several bike rides across San Francisco. After 15 years in the city, I’m still delighted that the ocean is just a short ride away my home. I’m recording some rides through the city with a GoPro to create extended views of the less-trodden corners of the city. This got me thinking of a project idea.

2: Something to hack on: I toyed with a concept to reimagine city streets with fewer cars. Could I use GenAI to visualize this across the city? Using video from my rides, I imagined transforming these spaces into parklets — a greener, more people-focused use of space. It might not be the right thing for every neighborhood, but would it be interesting to see?

3: Tech tinkering: My approach was to find all the cars and inpaint with parklets or greenery. Inpainting is basically a GenAI method that uses the context of an image and a generative model to fill in a designated image mask. The goal was to use AI to sketch out a different urban landscape at a scale across the city. To find the target elements, I processed frames with existing computer vision models Yolo and semantic segmentation to detect cars and trucks and create a mask. I fine-tuned Stable Diffusion using images of parklets and sketches of urban spaces to train a Lora model. This would fill in the designated image masks with greenery and people-focused elements. I then started processing segments of my rides.

4: Pause to Reflect: By the end of the week, a few things had become clear. First, long form video has a lot of frames. 30-60fps over an hour (30fps x 60s/min x 60min/hr = 108,000 frames) is a lot of images to process at 20-60 seconds per image (108,000 frames x 20s/frame = 25 days of compute time per hour of video). Secondly, there are a lot of cars parked in the city. So much area of the field of view of any image needs to be infilled to replace the cars that it looks more like a forest than a public space. Consistency of what’s generated from frame-to-frame was much more important than I expected. I may revisit at some point but decided to move on for the time being. This first week was about indulging curiosity and trying something without new with the weight of commitment. Looking forward to what comes next!