Haiku is a form of poetry originating in Japan. Its beauty in brevity has popularized it beyond its original language. In its English form, it usually appears in a 5-7-5 syllable structure. In this exploration, I combined my interest in linguistics, poetry, and technology to create a haiku generator— one that takes current events in the form of news articles and distills them into haikus.
The underlying algorithm is quite simple. It consits of two main componenets: the first processes the raw data, the words from the JSON objects, by first tagging them with their corresponding part of speech (POS), and then pairing them off into a series of bigrams. The second process is the actual generation of haikus. It constructs lines by searching through and stringing bigrams together until the exact syllable count has been achieved. It repeats this until it goes through the entire 5-7-5 syllabic structure.
Why bigrams? Mostly ease, and some practical considerations. I could try implement an n-gram but that wouild be a)more resource intensive and b) would probably exclude some of the more 'serendipidous' word combinations.
The final output was then wrapped in a web-app, which generates a series of haikus from the day's top headlines. When the user refreshes the page, a new set of haikus will be created.
A few ways this concept can be taken further: