Deploy AI model as a micro-service with Seldon
AI now growing so fast and help people in many areas. We could easy grab a pre-trained model to solve a specific problem or take little time for tuning a model with new problem.
But the difficulty is how to deploy AI model in production phase to service many requests from clients
In this article, we start with simply way to deploy AI model at scale in micro-service architect with Seldon.
What is Seldon?
It is the enterprise platform for ML deployment at scale (refer to https://www.seldon.io/)
It’s super simply way to make a service support both Rest API and gRPC protocol for communication.
Let’s start!
Firstly, let’s prepare a simple AI model from Tensorflow Hub. We start with object detection model which is very popular problem.
Import necessary
# For running inference on the TF-Hub module.import tensorflow as tfimport tensorflow_hub as hub
then, load pretrained model from tensorflow hub. Here, I use ssd mobilenet model
model_url = 'https://tfhub.dev/tensorflow/ssd_mobilenet_v2/2'
hub_model = hub.load(model_url)
finally, run inference with a single image
import cv2image_np = cv2.imread("Naxos_Taverna.jpg")
# running inferenceresults = hub_model(np.expand_dims(image_np, axis=0))result = {key:value.numpy() for key,value in results.items()}
optional, some utility function to visualize result, you could find here
https://gist.github.com/Phelan164/262c8c50781b832ef653789d7aab2f3f
So with few lines of code we have one model to detect objects. What’s next?
Deploy this model with Seldon to make a service which receive image input and return detected objects response
- Wrapper AI model with a template have predict method
Note: file name need same with class name
### WrapperModel.py ###
class WrapperModel(object):
"""
Model template. You can load your model parameters in __init__ from a location accessible at runtime
"""
def __init__(self):
"""
Add any initialization parameters. These will be passed at runtime from the graph definition parameters defined in your seldondeployment kubernetes resource manifest.
""" print("Initializing") self.hub_model = hub.load('https://tfhub.dev/tensorflow/ssd_mobilenet_v2/2') def predict(self, X, features_names=None):
"""
Return a prediction.
Parameters
----------
X : array-like
feature_names : array of feature names (optional)
"""
print("Predict called - will run identity function")
return self.hub_model(np.expand_dims(X, axis=0))
2. Define Dockerfile which need install seldon-core and tensorflow, tensorflow-hub
### Dockerfile ###FROM python:3.7-slimCOPY . /appWORKDIR /appRUN pip install seldon-core tensorflow tensorflow-hubEXPOSE 5000# Define environment variableENV MODEL_NAME WrapperModelENV SERVICE_TYPE MODELENV PERSISTENCE 0CMD exec seldon-core-microservice $MODEL_NAME --service-type $SERVICE_TYPE --persistence $PERSISTENCE
3. Build the docker image and run it
docker build . -t seldon-exampledocker run -e MODEL_NAME=WrapperModel seldon-example:latest
After service already launch, we test with health check API
curl localhost:5000/health/ping --output - # output: @@ ?
Now we almost done, let’s test the service