Thing Report

↩ Index

Haiku Generator

Haiku is a form of poetry originating in Japan. Its beauty in brevity has popularized it beyond its original language. In its English form, it usually appears in a 5-7-5 syllable structure. In this exploration, I combined my interest in linguistics, poetry, and technology to create a haiku generator— one that takes current events in the form of news articles and distills them into haikus.

Example article
Example haiku

The underlying algorithm is quite simple. It consits of two main componenets: the first processes the raw data, the words from the JSON objects, by first tagging them with their corresponding part of speech (POS), and then pairing them off into a series of bigrams. The second process is the actual generation of haikus. It constructs lines by searching through and stringing bigrams together until the exact syllable count has been achieved. It repeats this until it goes through the entire 5-7-5 syllabic structure.

The entire application, barring html/css and minimal javascript is written in Python. It also uses NLP libraries extensively, particularly NLTK.

Flowchart

Why bigrams? Mostly ease, and some practical considerations. I could try implement an n-gram but that wouild be a)more resource intensive and b) would probably exclude some of the more 'serendipidous' word combinations.

Tagged Bigrams

The final output was then wrapped in a web-app, which generates a series of haikus from the day's top headlines. When the user refreshes the page, a new set of haikus will be created.

Demo Video

Reflections

A few ways this concept can be taken further:

  1. More words: due to cost constraints, I didn't use the commercial version of Google News API. The free version of retrieves a 'summary' of the article, which is roughly a paragraph long. More words would enable a more complex algorithm to determine which words and concepts are central to the event.
  2. More complex algoirthm: bigrams (or n-grams for that matter) are the most simple of NLP algorithms, and doesn't really exhibit an actual understanding of language, or concepts beyond the probability of words occuring in a given sequence. An adversarial generative network for example, could create haikus that are more 'human' but would require a lot of human-generated training data. A potential compromise could take the form of a hybrid n-gram generator and neural net discriminator where the generator creates a large number of candidates and the discriminator simply selects the highest scoring of the bunch.