Getting Started

Livebook is a powerful Elixir tool that excels at data exploration and insight sharing. If you’re familiar with Python’s Jupyter Notebooks, you’ll feel right at home. But even if you’re new to this type of tool, don’t worry – Livebook is incredibly user-friendly and requires minimal setup.

In this tutorial, we’ll harness Livebook’s capabilities to caption images using your own machine’s hardware, all without writing a single line of code. This means you can generate image captions locally, without relying on external services. The best part? You don’t need a high-powered GPU – any machine will do.

Prerequisites

Before we dive in, you’ll need to install two key components: Elixir and Livebook.

Elixir

Elixir is compatible with all major operating systems. For installation instructions specific to your OS, visit the official Elixir website. While there are several installation methods available, I recommend using the asdf version manager for its flexibility. For this tutorial, I’ll be using Elixir 1.17.2.

Note: If you plan to run Livebook in a Docker container, you can skip this step.

Livebook

Livebook offers multiple installation options. You can find detailed instructions on the Livebook GitHub repository. In my experience, both the direct installation and Docker methods are straightforward.

Captioning Our First Image

With Elixir and Livebook installed, let’s jump into image captioning!

Step 1: Launch Livebook

Start by firing up a new Livebook server. If you’ve done a direct installation, use this command:

livebook server

You should see a message like this:

[Livebook] Application running at http://localhost:8080/?token=aypg2sijd5fpgv7sjl7zobuui4ijdaje

Open the provided URL in your browser to access Livebook.

Livebook Homepage

Step 2: Create Your Notebook

On the Livebook interface, click “New Notebook”. Give your notebook a meaningful name – I’m calling mine “Image Captioning”. Then, name the first section; I’ve chosen “Demo”. Don’t worry about dependencies or packages for now – we’ll tackle those later.

Step 3: Set Up the Neural Network Task Smart Cell

While Livebook typically uses code or markdown cells, it also offers smart cells for high-level tasks without coding. We’ll use one of these for our image captioning.

To add a smart cell:

  1. Hover your mouse after any existing cell
  2. Click the “+ Smart” button
  3. From the list that appears, choose “Neural Network task”

Adding the Neural Network task smart cell

Livebook will prompt you to add necessary packages. Click “+ Add and restart”. The installation may take a moment, but once it’s done, your smart cell will be ready to use.

Pro tip: You can delete any empty code cells at this point – we won’t need them for this tutorial.

Step 4: Choose Your Model

Now, let’s configure our smart cell:

  1. In the “Task” dropdown, select “Image-to-Text”. This narrows our options to models designed for image captioning.
  2. For the “Using” dropdown, choose either “BLIP (base)” or “BLIP (large)”. Both are pre-trained models, but they have different characteristics:
    • BLIP (base): Smaller and faster, ideal if you’re concerned about speed or storage.
    • BLIP (large): Larger and more accurate, generally produces better results.

Smart cell added and loaded

  1. Click the “Evaluate” (or “Reevaluate”) button above the smart cell. This initiates the model download and loading process. You’ll see a progress bar – be patient, as these models are sizeable.

Once complete, your smart cell will display an area for image uploads.

Step 5: Generate Your First Caption

Now for the exciting part! You have two options to add an image:

  1. Drag and drop an image directly into the smart cell
  2. Click the “Upload” button to select an image from your computer

After adding your image, click “Run”. The model will process your image and generate a caption.

The generated caption using the base BLIP model.

The generated caption using the large BLIP model. Contains more details than the base model. ‘arafed’ seems to be an artifact often included in the BLIP model, but doesn’t mean much here.

Wrapping Up

Congratulations! You’ve successfully captioned an image using Livebook, all without writing a single line of code. This powerful tool has allowed you to leverage a pre-trained model for image captioning, right on your local machine.

But this is just the tip of the iceberg. Livebook offers a wealth of possibilities beyond image captioning. I encourage you to explore further – try out different tasks and models available in the Neural Network smart cell. In our next section, we’ll delve deeper, using code to perform the same task but with greater control and flexibility. Stay tuned!