The LiteRT Next APIs are available in Kotlin, which offers Android developers a seamless development experience with access to high-level APIs.
For an example of a LiteRT Next application in Kotlin, see the Image segmentation with Kotlin demo.
Get Started
Use the following steps to add LiteRT Next to your Android application.
Add Maven package
Add the LiteRT Next dependency to your application:
dependencies {
...
implementation `com.google.ai.edge.litert:litert:2.0.0-alpha`
}
Create the Compiled Model
Using the CompiledModel
API, initialize the runtime with a model and your
choice of hardware acceleration:
val model =
CompiledModel.create(
context.assets,
"mymodel.tflite",
CompiledModel.Options(Accelerator.CPU),
env,
)
Create Input and Output Buffers
Create the necessary data structures (buffers) to hold the input data that you will feed into the model for inference, and the output data that the model produces after running inference.
val inputBuffers = model.createInputBuffers()
val outputBuffers = model.createOutputBuffers()
If you are using CPU memory, fill the inputs by writing data directly into the first input buffer.
inputBuffers[0].writeFloat(FloatArray(data_size) { data_value /* your data */ })
Invoke the model
Providing the input and output buffers, run the Compiled Model.
model.run(inputBuffers, outputBuffers)
Retrieve Outputs
Retrieve outputs by directly reading the model output from memory.
val outputFloatArray = outputBuffers[0].readFloat()
Key concepts and components
Refer to the following sections for information on key concepts and components of the LiteRT Next Kotlin APIs.
Basic Inference (CPU)
The following is a condensed, simplified implementation of inference with LiteRT Next.
// Load model and initialize runtime
val model =
CompiledModel.create(
context.assets,
"mymodel.tflite"
)
// Preallocate input/output buffers
val inputBuffers = model.createInputBuffers()
val outputBuffers = model.createOutputBuffers()
// Fill the first input
inputBuffers[0].writeFloat(FloatArray(data_size) { data_value /* your data */ })
// Invoke
model.run(inputBuffers, outputBuffers)
// Read the output
val outputFloatArray = outputBuffers[0].readFloat()
// Clean up buffers and model
inputBuffers.forEach { it.close() }
outputBuffers.forEach { it.close() }
model.close()
Compiled Model (CompiledModel)
The Compiled Model API (CompiledModel
) is responsible for loading a model,
applying hardware acceleration, instantiating the runtime, creating input and
output buffers, and running inference.
The following simplified code snippet demonstrates how the Compiled Model API
takes a LiteRT model (.tflite
) and creates a compiled model that is ready to
run inference.
val model =
CompiledModel.create(
context.assets,
"mymodel.tflite"
)
The following simplified code snippet demonstrates how the CompiledModel
API
takes an input and an output buffer, and runs inferences with the compiled
model.
// Preallocate input/output buffers
val inputBuffers = model.createInputBuffers()
val outputBuffers = model.createOutputBuffers()
// Fill the first input
inputBuffers[0].writeFloat(FloatArray(data_size) { data_value /* your data */ })
// Invoke
model.run(inputBuffers, outputBuffers)
// Read the output
val outputFloatArray = outputBuffers[0].readFloat()
// Clean up buffers and model
inputBuffers.forEach { it.close() }
outputBuffers.forEach { it.close() }
model.close()
For a more complete view of how the CompiledModel
API is implemented, see the
source code at
Model.kt.
Tensor Buffer (TensorBuffer)
LiteRT Next provides built-in support for I/O buffer interoperability, using the
Tensor Buffer API (TensorBuffer
) to handle the flow of data into and out of
the CompiledModel
. The Tensor Buffer API provides the ability to write
(Write<T>()
) and read (Read<T>()
), and lock buffers.
For a more complete view of how the Tensor Buffer API is implemented, see the source code at TensorBuffer.kt.