Elasticsearch とオブジェクト指向。Object datatype と Nested datatype の違い

Elasticsearch のデータタイプには Object datatype と Nested datatype というものがある。

Object datatype | Elasticsearch Reference [6.6] | Elastic

Nested datatype | Elasticsearch Reference [6.6] | Elastic

これを説明する前にオブジェクト指向な考え方について知っておかなければいけないが、これを書いている人間はつい最近まで「オブジェクト指向とはなんぞや？」という人間だったため詳しい人からすると誤った考え方をしているかもしれないのでその辺をご容赦いただいた上で読んでもらいたい。

オブジェクト指向的なデータの作り方

非オブジェクト指向的な考え方の場合、データを下記のような構造で作成すると思う。こういうデータ構造が出来ないわけではないし間違いでもない。

{"first_name": "John",
  "last_name": "Smith"
}

ではオブジェクト指向的な考えでデータを入れるならどういう構造になるだろうか？first_nameも last_nameも名前を構成する部品なので「名前」にぶら下げるのがいいのだろう。

{"name": {"first": "John",
    "last": "Smith"
  }}

「え、なんか面倒臭くないですか！？」なんて思ったりするかもしれない。この JSONを見るとネストされて若干複雑になっていたりキーも増えているし、デメリットしか無さそうに思える。（そう思っていた時期が私にはありました）

しかし、フィールドへのアクセスはフラットな場合とほとんど変わらない。

name.first
name.last

データの見方を変えてみる

JSONのままではわかりづらいのでそれぞれを表にしてみよう。

最初に書いた非オブジェクト指向的な考え方の場合、フィールドはフラットな関係なのでこんな感じになる。よくある表だ。

非オブジェクト指向的な考え方
user	first_name (string)	last_name (string)
John Smith	John	Smith
Alice White	Alice	White

ではオブジェクト指向的な考えで作成した JSONを表にしてみよう。

オブジェクト指向的な考え方①
user	name (object)
user	first (string)	last (string)
John Smith	John	Smith
Alice White	Alice	White

あるいはこういうイメージの仕方かもしれない。

オブジェクト指向的な考え方②
user	name (object)
John Smith	first (string)	last (string)
John Smith	John	Smith
Alice White	first (string)	last (string)
Alice White	Alice	White

Excelとかを使う人であればセルの結合はよく使っていると思うのですぐイメージ出来ると思う。非オブジェクト指向的な考え方であってもデータベースでよく見る構造だし間違っちゃいない。でも、オブジェクト指向的な考え方をすることによってそれぞれのフィールドに関連性をもたせることが出来るようになる。

さらに「オブジェクト」というものを意識した場合はこんな感じになるだろうか。「name」フィールドには直接「first」や「last」のデータが入っているわけではなく、「name Object」というものが入っていて、その中に「first」と「last」というフィールドが用意されているのだ。

オブジェクト指向的な構造

user

name (object)

John Smith

"name" Object
first (string)	last (string)
John	Smith

Alice White

"name" Object
first (string)	last (string)
Alice	White

非オブジェクト指向的な構造の場合、

「John Smith さんの『姓』のデータと『名』のデータをちょうだい」

と、2つのお願いをしなくてはいけない。また、我々人間には『姓』と『名』から『名前』という関連性をイメージできるが、機械からすれば『姓』と『名』の関連性はわからないだろう。

オブジェクト指向的な構造であれば

「John Smith さんの『名前』のデータちょうだい」

と、お願いすればあとは自分で好きに出来るし、機械からしても『姓』と『名』がどういうものかはわからないが『名前』というものに紐付いた何かなんだろう、くらいには感じるんじゃなかろうか。

Elasticsearch にネストしたデータを入れてみる

Elasticsearch にネストされたデータを入れた場合、特にマッピングの設定をしていなくても Dynamic templates がよしなにやってくれる。

f:id:mattintosh4:20190226162254p:plain — Kibana の Dev Tools でサンプルデータを投入してみる

PUT my_index/_doc/1
{"name": {"first": "John",
    "last": "Smith"
  }}PUT my_index/_doc/2
{"name": {"first": "Alice",
    "last": "White"
  }}GET my_index/_search

マッピングはこんな感じになる。通常、フィールド名のすぐ下には typeの指定が来るが、ネストしている場合は ❶ の部分に propertiesが来る。これが Elasticsearch で言うところの Object datatypeである。型指定は下層の値を格納する ❷ の部分で設定する。

GET my_index/_mapping
{"my_index" : {"mappings" : {"_doc" : {"properties" : {"name" : {"properties" : {❶
              "first" : {❷
                "type" : "text",
                "fields" : {"keyword" : {"type" : "keyword",
                    "ignore_above" : 256}}},
              "last" : {❷
                "type" : "text",
                "fields" : {"keyword" : {"type" : "keyword",
                    "ignore_above" : 256}}}}}}}}}}

Luceneで検索する場合は親と子のフィールドを .で連結する。

name.first:John AND name.last:Smith

Object datatype と Nested datatype について知る

特にマッピングをせずにデータを投入した場合、ネストされたデータは Object datatypeになる。検索も普通に効くし、特に問題が無いように思われるが、実は投入した _sourceの構造と Elasticsearch の内部での構造が異なってしまうことがある。

例えば、以下のように studentsフィールドを配列にして複数のデータを投入したとする。

PUT my_index/_doc/1
{"students": [{"first": "John",
      "last": "Smith"
    },
    {"first": "Alice",
      "last": "White"
    }]}

これ、入れたとおりの構造になっているかと思いきや、Elasticsearch の内部ではこういう風に解釈されているのである。

{"students": {"first": ["John", "Alice"],
    "last": ["Smith", "White"]}}

students.first:Johnや students.last:Smithで検索すればちゃんとマッチするし、特に問題ないのでは？と思うが、誤った結果を招くこともある。

例えば、本来は存在しない「John White」という人を検索してみるとする。

"John White"を検索するクエリ

GET my_index/_search
{"query": {"query_string": {"query": "students.first:John AND students.last:White"
    }}}

「John White」という人はいないので結果は0件であることが期待されるが、Elasticsearch からすれば、students.first:Johnは trueを返すし、students.last:Whiteも trueを返すので先程投入したデータが返ってくる。

{"took" : 17,
  "timed_out" : false,
  "_shards" : {"total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0},
  "hits" : {"total" : 1,
    "max_score" : 0.5753642,
    "hits" : [{"_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.5753642,
        "_source" : {"students" : [{"first" : "John",
              "last" : "Smith"
            },
            {"first" : "Alice",
              "last" : "White"
            }]}}]}}

このような結果にならないように、データ構造を維持しておけるデータタイプが Nested datatypeである。ネストされたオブジェクトを Nested datatype として扱いたい場合はマッピングで nestedを指定する。

PUT my_index/
{"mappings": {"_doc": {"properties": {"students": {"type": "nested"
        }}}}}

「データ構造を維持したまま保存できる Nested datatype の方が Object datatype よりもよいのではないか？」と思うが、Nested datatype では Luceneなどからネストされたフィールドに対して検索が出来ないという制限があるため、student.first:Johnといった感じの気軽な検索が出来なくなる。

Nested datatype のフィールドの検索には Nested Query という方法が用意されていて、使い方としてはネストの親を pathで指定し、そこを基点に検索するという感じ。

では、nested されたフィールドに検索をかけて誤った検索結果が返ってこないか試してみよう。

"Nested Query"のサンプル

# JohnSmithの検索
GET my_index/_search
{"query": {"nested": {"path": "students",
      "query": {"query_string": {"query": "students.first:John AND students.last:Smith"
        }}}}}

# JohnWhiteの検索（存在しないので何も出ない）
GET my_index/_search
{"query": {"nested": {"path": "students",
      "query": {"query_string": {"query": "students.first:John AND students.last:White"
        }}}}}

きっと期待通りの結果が得られると思う。

Kibana での見え方

Kibana では「配列内のオブジェクトはうまくサポートしない」と表示されるが、この表示は Object datatype も Nested datatype も同じ。

f:id:mattintosh4:20190226205542p:plain — Kibana - 配列内のオブジェクトの扱い

Index Patterns を見ても下層のフィールドはきちんと登録されるので nestedになっているかどうかはマッピング情報を見ないとわからない。

f:id:mattintosh4:20190226233544p:plain — Kibana - Index Patterns

Nested datatype のメリットを活かしつつ Luceneも使いたい

Nested datatype で Luceneが使えないということは Kibana の検索バーなどからも検索が出来なくなってしまうということであり、これは結構痛い。（これは Grafana も同様だったが、こちらは issue に要望が上がっていたので近々サポートされるかもしれない）

対応策として、copy_toでトップレベルの任意のフィールドに値をコピーしておくという方法がある。

下記は students.firstを first_namesに、students.lastを last_namesにコピーするマッピングの例。

PUT my_index/
{"mappings": {"_doc": {"properties": {"students": {"type": "nested",
          "properties": {"first": {"type": "text",
              "copy_to": "first_names", ❶
              "fields": {"keyword": {"type": "keyword",
                  "ignore_above": 256}}},
            "last": {"type": "text",
              "copy_to": "last_names", ❷
              "fields": {"keyword": {"type": "keyword",
                  "ignore_above": 256}}}}}}}}}

ツリーで書くとこんな感じだろうか。

_doc
|
+-- students
|   |
|   +-- [0]
|   |   |
|   |   +-- first -> copy_to: first_names ❶
|   |   |
|   |   +-- last  -> copy_to: last_names ❷
|   |
|   +-- [1]
|       |
|       +-- first -> copy_to: first_names ❶
|       |
|       +-- last  -> copy_to: last_names ❷
|
+-- first_names ❶
|
+-- last_names ❷

名前のデータを入れてからマッピングを見てみると _sourceには存在しない first_namesと last_namesのフィールドが増えているのがわかる。

{"my_index" : {"mappings" : {"_doc" : {"properties" : {"first_names" : {"type" : "text",
            "fields" : {"keyword" : {"type" : "keyword",
                "ignore_above" : 256}}},
          "last_names" : {"type" : "text",
            "fields" : {"keyword" : {"type" : "keyword",
                "ignore_above" : 256}}},
          "students" : {"type" : "nested",
            "properties" : {"first" : {"type" : "text",
                "fields" : {"keyword" : {"type" : "keyword",
                    "ignore_above" : 256}},
                "copy_to" : ["first_names"
                ]},
              "last" : {"type" : "text",
                "fields" : {"keyword" : {"type" : "keyword",
                    "ignore_above" : 256}},
                "copy_to" : ["last_names"
                ]}}}}}}}}

表にするとこんな感じだろうか。

my_index/_doc

students (object array)

first_names (array)

last_names (array)

Object[0]
first (string)	last (string)
John	Smith

Object[1]
first (string)	last (string)
Alice	White

John
-> /students[0]/first
Alice
-> /students[1]/first

Smith
-> /students[0]/last
White
-> /students[1]/last

Kibana で検索する場合は copy_toで作成したフィールドに対して検索をかければよい。

first_names:John

余談だが、copy_toはコピー先を複数指定することができるので、例えば姓と名の両方を集約したフィールドを作成することもできる。下記のように「❶ 名前検索」「❷ 名字検索」、用のフィールドに加えて「❸ 氏名検索」用のフィールドへコピーしてあげればよい。

        :
        "students": {"type": "nested",
          "properties": {"first": {"type": "text",
              "copy_to": ["first_names", ❶
                "full_names" ❸
              ],
              "fields": {"keyword": {"type": "keyword",
                  "ignore_above": 256}}},
            "last": {"type": "text",
              "copy_to": ["last_names", ❷
                "full_names" ❸
              ],
              "fields": {"keyword": {"type": "keyword",
                  "ignore_above": 256}}}}}
        :

メモ：copy_to のデータの取り出し方

copy_toで作成したフィールドは _sourceに含まれないので script_fieldsまたは docvalue_fieldsで取り出す必要がある。

この2つはデフォルトの _sourceの返し方が異なるので必要に応じて設定しておく。また、取り出したフィールドは _source内の配列順とは限らないので注意。

リクエスト方法	`_source`
`scripted_fields`	false
`docvalue_fields`	true

scripted_field を使って copy_to のコピー先のフィールドを出力するクエリ

GET my_index/_search
{"_source": true,
  "script_fields": {"任意のフィールド名": {"script": {"source": "doc['first_names.keyword'].values"
      }}}}

{"took" : 38,
  "timed_out" : false,
  "_shards" : {"total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0},
  "hits" : {"total" : 1,
    "max_score" : 1.0,
    "hits" : [{"_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {"students" : [{"first" : "John",
              "last" : "Smith"
            },
            {"first" : "Alice",
              "last" : "White"
            }]},
        "fields" : {"任意のフィールド名" : ["Alice",
            "John"
          ]}}]}}

docvalue_fieldsでは fieldをそのまま指定すればいい。formatには use_field_mappingを指定しておけばマッピングを元に決めてくれる。

docvalue_fields を使って copy_to のコピー先のフィールドを出力するクエリ

GET my_index/_search
{"_source": false, 
  "docvalue_fields": [{"field": "first_names.keyword",
      "format": "use_field_mapping"
    },
    {"field": "last_names.keyword",
      "format": "use_field_mapping"
    }]}

{"took" : 13,
  "timed_out" : false,
  "_shards" : {"total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0},
  "hits" : {"total" : 1,
    "max_score" : 1.0,
    "hits" : [{"_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "fields" : {"first_names.keyword" : ["Alice",
            "John"
          ],
          "last_names.keyword" : ["Smith",
            "White"
          ]}}]}}

それでは良い Elasticsearch ライフを😊

Elasticsearch とオブジェクト指向。Object datatype と Nested datatype の違い

オブジェクト指向的なデータの作り方

データの見方を変えてみる

Elasticsearch にネストしたデータを入れてみる

Object datatype と Nested datatype について知る

Kibana での見え方

Nested datatype のメリットを活かしつつ Luceneも使いたい

メモ：copy_to のデータの取り出し方

Trending Articles

大石組（紘城一家）

井上貴博アナウンサー彼女や結婚の噂は？実家や親が話題？人気は？

Na Palapalaiが歌うKa Ua Kipu`upu`uと歌詞

令和６年度大阪市・近畿ブロック空手道スポーツ少年団交流大会結果

PaliのLepe `Ula`ulaと歌詞の和訳

Login → AMF • All Models Forum

ゴールデン・スランバー　ザ・ビートルズ　歌詞　和訳

.awa音楽ファイルをmp3等に変換する方法

自宅警備員2　-灰原家の血族-　攻略

人気占い師・Sakkoが占う！今日のアナタの運勢と、ラッキーカラーは・・・

突撃！ビデオの鉄人

ドライブレコーダー「Audi UTR」の設定方法

画像・写真】ららぽーと横浜で16歳男子高校生が転落死不審な動き→逃走し警備員に追いかけられ→柵越え飛び降り・12m転落窃盗・万引き？それとも盗撮？

生野が生んだスーパースター文政　現在、男道（刑務所）にて修行（服役）中㉙

山口組系幹部さらに１人逮捕　神戸山口組系組長傷害致死事件で

部落探訪(322)静岡県藤枝市岡部町内谷

【傍聴記録】高山宏二、櫻本翔也、結城　薫、東野　彰、東野哲也、北垣外美月

首吊りの重要ワード「定型」「非定型」「ハイペン」「ハイベン」って何？

梶浦郁乃(東邦高校元マネージャー)今現在OLで彼氏(藤島健斗)とは交際は続いている?

大阪・泉南イオンで飛び降り自殺とみられる転落事件が発生：ネットで拡散された理由とは