SQL Server 2019 の Big Data Cluster に Cognitive Services in containers を組み込んでみる at SE の雑記

SQL Server 2019 の Big Data Cluster は Kubernetes 上に構築されたデータ分析基盤となります。

このデータ分析基盤の上に、Cognitive Serivces in containers を組み込む場合、どのような作業が必要になるのかを、軽く検証してみました。
今回は、NUC 上に構築した k8s 上に展開しています。

最初に、Big Data Cluster の k8s の基盤上に Cognitive Serivces in containers を展開する必要があります。
今回使用したマニフェストファイルは次のような内容になります。

kind: Namespace
apiVersion: v1
metadata:
  name: cognitive
---
apiVersion: v1
kind: Secret
metadata:
  name: cognitive-apikey
type: Opaque
data:
  apikey: <base64 エンコードした API Key>
---
kind: Pod
apiVersion: v1
metadata:
  name: cognitive-keyphrase
  namespace: cognitive
  labels:
    name: cognitive-keyphrase
spec:
  containers:
  - name: myapp
    image: mcr.microsoft.com/azure-cognitive-services/keyphrase
    ports:
      - containerPort: 5000
    env:
      - name: Billing
        value: "https://japaneast.api.cognitive.microsoft.com/text/analytics/v2.0"
      - name: ApiKey
        valueFrom:
          secretKeyRef:
                name: cognitive-apikey
                key: apikey
    args: ["Eula=accept", "Billing=$(Billing)", "ApiKey=$(ApiKey)"]
---
kind: Service
apiVersion: v1
metadata:
  name: service-cognitive
  namespace: cognitive
spec:
  type: ClusterIP
  ports:
    - protocol: TCP
      port: 5000
      targetPort: 5000
  selector:
    name: cognitive-keyphrase

API キーは Secret に格納しているのですが、Creating a Secret Manually に記載されているように、Base64 エンコードした API キーの値を設定する必要があります。
これで、次のように「cognitive」という名前空間の中に、Pod とサービスが展開されます。

今回は「ClusterIP」で作成しているため、リモートからの接続 / 動作確認は、kubectl が動作する Windows 環境から、次のコマンドを実行して、ローカルからポートフォワードを使って確認しています。

kubectl port-forward pod/cognitive-keyphrase 30005:5000 -n cognitive

これで、「localhost:30005」で接続して、事前の確認ができます。

次にこのサービスを Big Data Cluster 経由で呼ぶための設定を行います。

こちらも SQL Server 2019 からの新機能となりますが、ML Services on Linux を使用してみます。
SQL Server 2017 では PREDICT 関数経由で、機械学習を使用するパターンのみがサポートされていたのですが、SQL Server 2019 では、SQL Server on Linux で ML Services (In-Database) が使用できるようになりました。

そのため、SQL Server で Python を実行することができるようになります。
Big Data Cluster の SQL Server のインタフェースについては Master Instance 経由で実施することになり、このインスタンスには、SQL Server on Linux の ML Services が展開されていますので、「sp_execute_external_script」を実行することができます。

初期設定では、ML Services は無効になっていますので、最初に次のクエリを実行して、機能を有効化します。

EXEC sp_configure 'external scripts enabled', 1
RECONFIGURE

これで準備が完了しましたので、次のクエリを実行してみます。
Python のコードはクイックスタート: Python を使用して Text Analytics Cognitive Service を呼び出すを参考にしています。

DECLARE @ret_value nvarchar(max)
DECLARE @query nvarchar(max) = '
SELECT value FROM
(VALUES
	(''I had a wonderful experience! The rooms were wonderful and the staff was helpful.''),
	(''I had a terrible time at the hotel. The staff was rude and the food was awful.'')
) AS T(value)
'
exec sp_execute_external_script
@language =N'Python',
@script=N'
import requests
language_api_url = "http://service-cognitive.cognitive:5000/text/analytics/v2.0/keyPhrases"
headers = {"Content-Type": "application/json"}
documents = {"documents": []}
cnt = 1
for row in  InputDataSet.iterrows():
    documents["documents"].append({"id": cnt, "lang": "en", "text": row[1].value})
    cnt += 1
response = requests.post(language_api_url, headers=headers, json=documents)
language = response.json()
ret = str(language)
',
@input_data_1 = @query,
@params = N' @ret varchar(max) OUTPUT',
@ret = @ret_value OUTPUT
SELECT @ret_Value

正常に実行できると、次の画像のように、Cognitive Services in containers によって分析された結果の JSON の値が取得できるかと。

Big Data Cluster は、Hadoop のエコシステム + Spark を搭載したデータ分析基盤ではありますが、k8s のコンテナーのテクノロジー上に構築されています。

標準で搭載されている分析方法以外が必要な場合、分析に必要な機能を搭載したコンテナーを追加することで、新しい分析方法を追加するということもできるのではないでしょうか。

月	火	水	木	金	土	日
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

SE の雑記

SQL Server 2019 の Big Data Cluster に Cognitive Services in containers を組み込んでみる

Leave a Reply

検索

アーカイブ

最近の投稿

カテゴリー