Finding the nuggets of purest chat, deep from within the bowels of The Bean Machine.
All transcribed content copyright Three Bean Salad podcast.
What is this?
Three Bean Salad is a podcast by comedians Mike Wozniak, Henry Paker & Benjamin Partridge. AI has been used to convert episodes of the show to text, which can be searched here.
Why the weird domain name?
It matches the official domain enterthebeanmachine.boats which, as explained in the "Assassinations" episode, is because:
So this is because you've legally had to register the bean machine as a frigate, haven't you? In international waters. As an unarmed frigate. As an unarmed frigate. And that's for insurance purposes, isn't it?
That should explain it for you.
Which episodes have been transcribed?
All episodes in the 'Pinto' Patreon feed have been transcribed. New episodes should automatically be added shortly after being published.
Can I trust the results?
The results haven't been verified word for word, and shouldn't be used as the basis of any litigation or major life decision. However the AI system seems to have done an excellent job, managing to transcribe such esoteric terms as "fanjambo", "The Streets of Beanadelfia", and "beanmageddon".
Why is the text broken into little chunks?
The chunks of text are called "segments". Segments are the grouping of words based on the structure of the audio, not its punctuation or meaning. It's how the transcription system breaks the text down without having to understand what is being said. Clicking on a result will bring up roughly 30s either side of the segment to hopefully put it in its context.
Any searching tips?
Search terms are fuzzy, so searching for "big tree" returns "between the big tree and the bench" from the Maps episode, as well as "tree or a big bush" from the Whales episode a year later. Putting quotes around terms will disable fuzzy matching. Wildcards are also supported, e.g. "beana*" returns "beanage" and "Beanadelfia".I am a nerd and would like to learn more.
The code and more technical details are available on github, but tl;dr - episodes are fetched from the Patreon RSS feed; OpenAI's Whisper library processes the files locally and produces text segments; the text is inserted into an sqlite database that's using a fts5 full-text-search enabled table. The web stack is python flask & vanilla js. The whole thing runs on the smallest fly.io instance. With 206 episodes transcribed, the database is 75MB in size.
How can I support this site?
The best thing you can do is support Three Bean Salad. Go buy some (more) merch. If you have too much already you can send it to me instead.
Feedback? Questions?
You can email me at fanjambo@searchthebeanmachine.boats, or leave technical suggests on github.