Run LiteRT Next on Android with Kotlin

The LiteRT Next APIs are available in Kotlin, which offers Android developers a seamless development experience with access to high-level APIs.

For an example of a LiteRT Next application in Kotlin, see the Image segmentation with Kotlin demo.

Get Started

Use the following steps to add LiteRT Next to your Android application.

Add Maven package

Add the LiteRT Next dependency to your application:

dependencies {
  ...
  implementation `com.google.ai.edge.litert:litert:2.0.0-alpha`
}

Create the Compiled Model

Using the CompiledModel API, initialize the runtime with a model and your choice of hardware acceleration:

val  model =
  CompiledModel.create(
    context.assets,
    "mymodel.tflite",
    CompiledModel.Options(Accelerator.CPU),
    env,
  )

Create Input and Output Buffers

Create the necessary data structures (buffers) to hold the input data that you will feed into the model for inference, and the output data that the model produces after running inference.

val inputBuffers = model.createInputBuffers()
val outputBuffers = model.createOutputBuffers()

If you are using CPU memory, fill the inputs by writing data directly into the first input buffer.

inputBuffers[0].writeFloat(FloatArray(data_size) { data_value /* your data */ })

Invoke the model

Providing the input and output buffers, run the Compiled Model.

model.run(inputBuffers, outputBuffers)

Retrieve Outputs

Retrieve outputs by directly reading the model output from memory.

val outputFloatArray = outputBuffers[0].readFloat()

Key concepts and components

Refer to the following sections for information on key concepts and components of the LiteRT Next Kotlin APIs.

Basic Inference (CPU)

The following is a condensed, simplified implementation of inference with LiteRT Next.

// Load model and initialize runtime
val  model =
    CompiledModel.create(
        context.assets,
        "mymodel.tflite"
    )

// Preallocate input/output buffers
val inputBuffers = model.createInputBuffers()
val outputBuffers = model.createOutputBuffers()

// Fill the first input
inputBuffers[0].writeFloat(FloatArray(data_size) { data_value /* your data */ })

// Invoke
model.run(inputBuffers, outputBuffers)

// Read the output
val outputFloatArray = outputBuffers[0].readFloat()

// Clean up buffers and model
inputBuffers.forEach { it.close() }
outputBuffers.forEach { it.close() }
model.close()

Compiled Model (CompiledModel)

The Compiled Model API (CompiledModel) is responsible for loading a model, applying hardware acceleration, instantiating the runtime, creating input and output buffers, and running inference.

The following simplified code snippet demonstrates how the Compiled Model API takes a LiteRT model (.tflite) and creates a compiled model that is ready to run inference.

val  model =
  CompiledModel.create(
    context.assets,
    "mymodel.tflite"
  )

The following simplified code snippet demonstrates how the CompiledModel API takes an input and an output buffer, and runs inferences with the compiled model.

// Preallocate input/output buffers
val inputBuffers = model.createInputBuffers()
val outputBuffers = model.createOutputBuffers()

// Fill the first input
inputBuffers[0].writeFloat(FloatArray(data_size) { data_value /* your data */ })
// Invoke
model.run(inputBuffers, outputBuffers)
// Read the output
val outputFloatArray = outputBuffers[0].readFloat()

// Clean up buffers and model
inputBuffers.forEach { it.close() }
outputBuffers.forEach { it.close() }
model.close()

For a more complete view of how the CompiledModel API is implemented, see the source code at Model.kt.

Tensor Buffer (TensorBuffer)

LiteRT Next provides built-in support for I/O buffer interoperability, using the Tensor Buffer API (TensorBuffer) to handle the flow of data into and out of the CompiledModel. The Tensor Buffer API provides the ability to write (Write<T>()) and read (Read<T>()), and lock buffers.

For a more complete view of how the Tensor Buffer API is implemented, see the source code at TensorBuffer.kt.