Deploy AI model as a micro-service with Seldon

3 min readFeb 27, 2021

AI now growing so fast and help people in many areas. We could easy grab a pre-trained model to solve a specific problem or take little time for tuning a model with new problem.

But the difficulty is how to deploy AI model in production phase to service many requests from clients

In this article, we start with simply way to deploy AI model at scale in micro-service architect with Seldon.

What is Seldon?

It is the enterprise platform for ML deployment at scale (refer to https://www.seldon.io/)

It’s super simply way to make a service support both Rest API and gRPC protocol for communication.

Let’s start!

Firstly, let’s prepare a simple AI model from Tensorflow Hub. We start with object detection model which is very popular problem.

Import necessary

# For running inference on the TF-Hub module.import tensorflow as tfimport tensorflow_hub as hub

then, load pretrained model from tensorflow hub. Here, I use ssd mobilenet model

model_url = 'https://tfhub.dev/tensorflow/ssd_mobilenet_v2/2'
hub_model = hub.load(model_url)

finally, run inference with a single image

# By Heiko Gorski, Source: https://commons.wikimedia.org/wiki/File:Naxos_Taverna.jpg

import cv2image_np = cv2.imread("Naxos_Taverna.jpg")
# running inferenceresults = hub_model(np.expand_dims(image_np, axis=0))result = {key:value.numpy() for key,value in results.items()}

optional, some utility function to visualize result, you could find here

https://gist.github.com/Phelan164/262c8c50781b832ef653789d7aab2f3f

Result image(not so correct because of model but it not important here)

So with few lines of code we have one model to detect objects. What’s next?

Deploy this model with Seldon to make a service which receive image input and return detected objects response

Wrapper AI model with a template have predict method

Note: file name need same with class name

### WrapperModel.py ###
class WrapperModel(object):
    """
    Model template. You can load your model parameters in __init__ from a location accessible at runtime
    """

    def __init__(self):
        """
        Add any initialization parameters. These will be passed at runtime from the graph definition parameters defined in your seldondeployment kubernetes resource manifest.
        """        print("Initializing")        self.hub_model = hub.load('https://tfhub.dev/tensorflow/ssd_mobilenet_v2/2')    def predict(self, X, features_names=None):
        """
        Return a prediction.

        Parameters
        ----------
        X : array-like
        feature_names : array of feature names (optional)
        """
        print("Predict called - will run identity function")
        return self.hub_model(np.expand_dims(X, axis=0))

2. Define Dockerfile which need install seldon-core and tensorflow, tensorflow-hub

### Dockerfile ###FROM python:3.7-slimCOPY . /appWORKDIR /appRUN pip install seldon-core tensorflow tensorflow-hubEXPOSE 5000# Define environment variableENV MODEL_NAME WrapperModelENV SERVICE_TYPE MODELENV PERSISTENCE 0CMD exec seldon-core-microservice $MODEL_NAME --service-type $SERVICE_TYPE --persistence $PERSISTENCE

3. Build the docker image and run it

docker build . -t seldon-exampledocker run  -e MODEL_NAME=WrapperModel seldon-example:latest

After service already launch, we test with health check API

curl localhost:5000/health/ping --output - # output: @@ ?

Now we almost done, let’s test the service

Deploy AI model as a micro-service with Seldon

What is Seldon?

Let’s start!

Written by Tam NguyenVan