The AI Edge Function Calling SDK (FC SDK) is a library that enables developers to use function calling with on-device LLMs. Function calling lets you connect models to external tools and APIs, enabling models to call specific functions with the necessary parameters to execute real-world actions.
Rather than just generating text, an LLM using the FC SDK can generate a structured call to a function that executes an action, such as searching for up-to-date information, setting alarms, or making reservations.
This guide walks you through a basic quickstart to add the LLM Inference API with the FC SDK to an Android application. This guide focuses on adding function calling capabilities to an on-device LLM. For more information on using the LLM Inference API, see the LLM Inference for Android guide.
Quickstart
Use the following steps to use the FC SDK in your Android application. This quickstart uses the LLM Inference API with Hammer 2.1 (1.5B). The LLM Inference API is optimized for high-end Android devices, such as Pixel 8 and Samsung S23 or later, and does not reliably support device emulators.
Add dependencies
The FC SDK uses the com.google.ai.edge.localagents:localagents-fc
library and
the LLM Inference API uses the com.google.mediapipe:tasks-genai
library. Add
both dependencies to the build.gradle
file of your Android app:
dependencies {
implementation 'com.google.mediapipe:tasks-genai:0.10.23'
implementation 'com.google.ai.edge.localagents:localagents-fc:0.1.0'
}
For devices with Android 12 (API 31) or higher, add the native OpenCL library
dependency. For more information, see the documentation on the
uses-native-library
tag.
Add the following uses-native-library
tags to the AndroidManifest.xml
file:
<uses-native-library android:name="libOpenCL.so" android:required="false"/>
<uses-native-library android:name="libOpenCL-car.so" android:required="false"/>
<uses-native-library android:name="libOpenCL-pixel.so" android:required="false"/>
Download a model
Download Gemma-3 1B in a 4-bit quantized format from Hugging Face. For more information on the available models, see the Models documentation.
Push the content of the gemma3-1b-it-int4.task
folder to the Android device.
$ adb shell rm -r /data/local/tmp/llm/ # Remove any previously loaded models
$ adb shell mkdir -p /data/local/tmp/llm/
$ adb push gemma3-1b-it-int4.task /data/local/tmp/llm/gemma3-1b-it-int4.task
Declare function definitions
Define the functions that will be made available to the model. To illustrate the process, this quickstart includes two functions as static methods that return hard-coded responses. A more practical implementation would define functions that calls a REST API or retrieves information from a database.
The following defines the getWeather
and getTime
functions:
class ToolsForLlm {
public static String getWeather(String location) {
return "Cloudy, 56°F";
}
public static String getTime(String timezone) {
return "7:00 PM " + timezone;
}
private ToolsForLlm() {}
}
Use FunctionDeclaration
to describe each function, giving each a name and
description, and specifying the types. This informs the model of what the
functions do and when to make function calls.
var getWeather = FunctionDeclaration.newBuilder()
.setName("getWeather")
.setDescription("Returns the weather conditions at a location.")
.setParameters(
Schema.newBuilder()
.setType(Type.OBJECT)
.putProperties(
"location",
Schema.newBuilder()
.setType(Type.STRING)
.setDescription("The location for the weather report.")
.build())
.build())
.build();
var getTime = FunctionDeclaration.newBuilder()
.setName("getTime")
.setDescription("Returns the current time in the given timezone.")
.setParameters(
Schema.newBuilder()
.setType(Type.OBJECT)
.putProperties(
"timezone",
Schema.newBuilder()
.setType(Type.STRING)
.setDescription("The timezone to get the time from.")
.build())
.build())
.build();
Add the function declarations to a Tool
object:
var tool = Tool.newBuilder()
.addFunctionDeclarations(getWeather)
.addFunctionDeclarations(getTime)
.build();
Create the inference backend
Create an inference backend using LLM Inference API and pass it a formatter
object for your model. The FC SDK Formatter (ModelFormatter
) acts as both a
formatter and parser. Since this quickstart uses Gemma-3 1B, we will use
GemmaFormatter
:
var llmInferenceOptions = LlmInferenceOptions.builder()
.setModelPath(modelFile.getAbsolutePath())
.build();
var llmInference = LlmInference.createFromOptions(context, llmInferenceOptions);
var llmInferenceBackend = new llmInferenceBackend(llmInference, new GemmaFormatter());
For more information, see the LLM Inference configuration options.
Instantiate the model
Use the GenerativeModel
object to connect the inference backend, system
prompt, and tools. We already have the inference backend and tools, so we only
need to create the system prompt:
var systemInstruction = Content.newBuilder()
.setRole("system")
.addParts(Part.newBuilder().setText("You are a helpful assistant."))
.build();
Instantiate the model with GenerativeModel
:
var generativeModel = new GenerativeModel(
llmInferenceBackend,
systemInstruction,
List.of(tool),
)
Start a chat session
For simplicity, this quickstart starts a single chat session. You can also create multiple, independent sessions.
Using the new instance of GenerativeModel
, start a chat session:
var chat = generativeModel.startChat();
Send prompts to the model through the chat session, using the sendMessage
method:
var response = chat.sendMessage("How's the weather in San Francisco?");
Parse the model response
After passing a prompt to the model, the application must examine the response to determine whether to make a function call or output natural language text.
// Extract the model's message from the response.
var message = response.getCandidates(0).getContent().getParts(0);
// If the message contains a function call, execute the function.
if (message.hasFunctionCall()) {
var functionCall = message.getFunctionCall();
var args = functionCall.getArgs().getFieldsMap();
var result = null;
// Call the appropriate function.
switch (functionCall.getName()) {
case "getWeather":
result = ToolsForLlm.getWeather(args.get("location").getStringValue());
break;
case "getTime":
result = ToolsForLlm.getWeather(args.get("timezone").getStringValue());
break;
default:
throw new Exception("Function does not exist:" + functionCall.getName());
}
// Return the result of the function call to the model.
var functionResponse =
FunctionResponse.newBuilder()
.setName(functionCall.getName())
.setResponse(
Struct.newBuilder()
.putFields("result", Value.newBuilder().setStringValue(result).build()))
.build();
var response = chat.sendMessage(functionResponse);
} else if (message.hasText()) {
Log.i(message.getText());
}
The sample code is an overly simplified implementation. For more information on how an application could examine model responses, see Formatting and Parsing.
How it works
This section provides more in-depth information on the core concepts and components of the Function Calling SDK for Android.
Models
The Function Calling SDK requires a model with a formatter and parser. The FC SDK contains a built-in formatter and parser for the following models:
In order to use a different model with the FC SDK, you must develop your own formatter and parser that is compatible with the LLM Inference API.
Formatting and parsing
A key part of function calling support is the formatting of prompts and parsing
of model output. While these are two separate processes, the FC SDK handles both
formatting and parsing with the ModelFormatter
interface.
The formatter is responsible for converting the structured function declarations into text, formatting function responses, and inserting tokens to indicate the start and end of conversation turns, as well as the roles of those turns (e.g. "user", "model").
The parser is responsible for detecting whether the model response contains a function call. If the parser detects a function call, it parses it into a structured data type. Otherwise, it treats the text as a natural language response.
Constrained decoding
Constrained decoding is a technique that guides the output generation of LLMs to ensure it adheres to a predefined structured format, such as JSON objects or Python function calls. By enforcing these constraints, the model formats its outputs in a way that aligns with the predefined functions and their corresponding parameter types.
To enable constrained decoding, define the constraints in a ConstraintOptions
object and invoke the enableConstraint
method of a ChatSession
instance.
When enabled, this constraint will restrict the response to only include the
tools associated with the GenerativeModel
.
The following example demonstrates how to configure constrained decoding to
restrict the response to tool calls. It constrains the tool call to start with
the prefix ```tool_code\n
and end with the suffix
\n```
.
ConstraintOptions constraintOptions = ConstraintOptions.newBuilder()
.setToolCallOnly( ConstraintOptions.ToolCallOnly.newBuilder()
.setConstraintPrefix("```tool_code\n")
.setConstraintSuffix("\n```"))
.build(); chatSession.enableConstraint(constraintOptions);
To disable the active constraint within the same session, use the
disableConstraint
method:
chatSession.disableConstraint()