新增

创建一个雇员目录

对于员工目录，我们将做如下操作：

每个员工索引一个文档，文档包含该员工的所有信息。
每个文档都将是 employee 类型。
该类型位于索引 megacorp 内。
该索引保存在我们的 Elasticsearch 集群中。

PUT /megacorp/employee/1
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

注意，路径 /megacorp/employee/1 包含了三部分的信息：

megacorp

索引名称
employee

类型名称
1

特定雇员的ID

请求体 —— JSON 文档 —— 包含了这位员工的所有详细信息，他的名字叫 John Smith ，今年 25 岁，喜欢攀岩。

进行下一步前，让我们增加更多的员工信息到目录中：

PUT /megacorp/employee/2
{
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}

PUT /megacorp/employee/3
{
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}

查询

# 查询员工
GET /megacorp/employee/1
# 查询结果
{
  "_index" : "megacorp",
  "_type" : "employee",
  "_id" : "1",
  "_version" : 2,
  "_seq_no" : 6,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "first_name" : "John",
    "last_name" : "Smith",
    "age" : 19,
    "about" : "I love to go rock climbing",
    "interests" : [
      "sports",
      "music"
    ]
  }
}

将 HTTP 命令由 PUT 改为 GET 可以用来检索文档，同样的，可以使用 DELETE 命令来删除文档，以及使用 HEAD 指令来检查文档是否存在。如果想更新已存在的文档，只需再次 PUT

轻量搜索

# 搜索所有雇员
GET /megacorp/employee/_search

索引库 megacorp 以及类型 employee，但与指定一个文档 ID 不同，这次使用 _search 。返回结果包括了所有三个文档，放在数组 hits 中

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "千羽",
          "last_name" : "千寻",
          "age" : 18,
          "about" : "我喜欢看NBA",
          "interests" : [
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "Douglas",
          "last_name" : "Fir",
          "age" : 20,
          "about" : "I like to build cabinets",
          "interests" : [
            "forestry"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 19,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      }
    ]
  }
}

使用 _search 端点，并将查询本身赋值给参数 q= 。

{
  "took" : 19,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.7095568,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 1.7095568,
        "_source" : {
          "first_name" : "千羽",
          "last_name" : "千寻",
          "age" : 18,
          "about" : "我喜欢看NBA",
          "interests" : [
            "music"
          ]
        }
      }
    ]
  }
}

使用表达式搜索

GET /megacorp/employee/_search
{
  "query":{
    "match": {
      "last_name": "千寻"
    }
  }
}
# 查询结果
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.7095568,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 1.7095568,
        "_source" : {
          "first_name" : "千羽",
          "last_name" : "千寻",
          "age" : 18,
          "about" : "我喜欢看NBA",
          "interests" : [
            "music"
          ]
        }
      }
    ]
  }
}

更复杂的搜索

# 现在尝试下更复杂的搜索。 同样搜索姓氏为 Smith 的员工，但这次我们只需要年龄大于 30 的。查询需要稍作调整，使用过滤器 filter ，它支持高效地执行一个结构化查询。
GET /megacorp/employee/_search
{
  "query":{
    "bool": {
      "must": {
        "match":{
          "last_name" : "千寻"
        }
      },
      "filter": [
        {
          "range": {
            "age": {
              "gt": 17
            }
          }
        }
      ]
    }
  }
}

这部分与我们之前使用的 match 查询一样。这部分是一个 range 过滤器 ，它能找到年龄大于 30 的文档，其中 gt 表示大于(great than)。

全文搜索

截止目前的搜索相对都很简单：单个姓名，通过年龄过滤。现在尝试下稍微高级点儿的全文搜索——一项传统数据库确实很难搞定的任务。

搜索下NBA现役最老球员的员工：

GET /megacorp/employee/_search
{
  "query":{
    "match": {
      "about": "NBA现役最老球员"
    }
  }
}

高亮搜索

许多应用都倾向于在每个搜索结果中高亮部分文本片段，以便让用户知道为何该文档符合查询条件。在 Elasticsearch 中检索出高亮片段也很容易。

再次执行前面的查询，并增加一个新的 highlight 参数：

集群内的原理

集群健康

Elasticsearch 的集群监控信息中包含了许多的统计数据，其中最为重要的一项就是 集群健康 ，它在 status 字段中展示为 green 、 yellow 或者 red 。

GET /_cluster/health

{
  "cluster_name" : "elasticsearch",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 10,
  "active_shards" : 10,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 1,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 90.9090909090909
}

status 字段是我们最关心的。

status 字段指示着当前集群在总体上是否工作正常。它的三种颜色含义如下：

green

所有的主分片和副本分片都正常运行。
yellow

所有的主分片都正常运行，但不是所有的副本分片都正常运行。
red

有主分片没能正常运行。

在本章节剩余的部分，我们将解释什么是主分片和副本分片，以及上面提到的这些颜色的实际意义。

文档元数据

一个文档不仅仅包含它的数据，也包含 元数据 —— 有关文档的信息。三个必须的元数据元素如下：

_index

文档在哪存放
_type

文档表示的对象类别
_id

文档唯一标识

_index

一个索引应该是因共同的特性被分组到一起的文档集合。例如，你可能存储所有的产品在索引 products 中，而存储所有销售的交易到索引 sales 中。虽然也允许存储不相关的数据到一个索引中，但这通常看作是一个反模式的做法。

实际上，在 Elasticsearch 中，我们的数据是被存储和索引在分片中，而一个索引仅仅是逻辑上的命名空间，这个命名空间由一个或者多个分片组合在一起。然而，这是一个内部细节，我们的应用程序根本不应该关心分片，对于应用程序而言，只需知道文档位于一个索引内。

取回一个文档

GET /website/blog/123?pretty

{
  "_index" : "website",
  "_type" : "blog",
  "_id" : "123",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "title" : "我第一个博客",
    "text" : "正在学习es",
    "date" : "2023/11/13"
  }
}

返回文档的一部分

GET /website/blog/123?_source=title,text

{
  "_index" : "website",
  "_type" : "blog",
  "_id" : "123",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "text" : "正在学习es",
    "title" : "我第一个博客"
  }
}