ElasticSearch中文排序问题

1需求

中文需要根据首字母进行排序比较,而ElasticSearch的排序规则是根据ASCII码,所以从人的感知上是没有任何的顺序规则的。

另外由于部分字段使用的analyzer使用ngram_analyzer或者ik_analyzer,在满足以上analyzer的同时,则需要使用ElasticSearch中index mapping字段的fields配置,用来不同的方式进行analyzer。

2解决方法

2.1 Quickstart elasticsearch-analysis-pinyin

2.1.1 Install elasticsearhc-analysis-pinyin

Docker下运行ElasticSearch的时候,进行安装

  • Run a command in a running container
  docker exec -it docker-elasticsearch bash

corehr-elasticsearch为container name

  • Install elasticsearch-analysis-pinyin by elasticsearch-plugin
  elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v6.8.10/elasticsearch-analysis-pinyin-6.8.10.zip
  • Restart container
  docker restart docker-elasticsearch

2.1.2 Test Anaylyzer,analyzing a chinese name,such as 经理+J0001A01

  • Request
  ###
  POST {{host}}/_analyze
  Content-Type: application/json

  {
      "tokenizer":{
          "type": "pinyin",
          "keep_first_letter": false,
          "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_none_chinese_in_joined_full_pinyin": true,
          "keep_none_chinese_in_first_letter": true,
          "none_chinese_pinyin_tokenize": false,
          "lowercase" : false
      },
      "text":"经理+J0001A01"
  }

Optional Parameters可以查看参考1,上述举例的配置为了得到一个token,便于排序,而不是为了进行搜索。

  • Response
  {
      "tokens":[
          {
              "token":"jingliJ0001A01",
              "start_offset":0,
              "end_offset":0,
              "type":"word",
              "position":0
          }
      ]
  }
  • Request
  ###
  POST {{host}}/_analyze
  Content-Type: application/json

  {
      "tokenizer":{
          "type" : "pinyin",
          "keep_first_letter":true,
          "keep_separate_first_letter" : false,
          "keep_full_pinyin" : false,
          "limit_first_letter_length" : 20,
          "lowercase" : false,
          "keep_none_chinese":false
      },
      "text":"中华人民共和国"
  }
  • Response
  {
      "tokens":[
          {
              "token":"zhrmghg",
              "start_offset":0,
              "end_offset":0,
              "type":"word",
              "position":0
          }
      ]
  }

2.2 Create index,using elasticsearch-analyzer-pinyin

2.2.1 Add a new index

Define yourself analyzer by settings, for example ngram_analyzer,ik_analyzer,pinyin_analyzer and so on;

###
PUT {{host}}/bar
Content-Type: application/json

{
    "settings":{
        "analysis":{
            "analyzer":{
                "ngram_analyzer":{
                    "tokenizer":"ngram_tokenizer",
                    "filter":[
                        "lowercase"
                    ]
                },
                "ik_analyzer":{
                    "tokenizer":"ik_max_word",
                    "filter":[
                        "lowercase"
                    ]
                },
                "my_keyword":{
                    "tokenizer":"keyword",
                    "filter":["lowercase"]
                },
                "pinyin_analyzer":{
                    "tokenizer": "my_pinyin"
                }
            },
            "tokenizer":{
                "ngram_tokenizer":{
                    "type":"ngram",
                    "min_gram":1,
                    "max_gram":50
                },
                "my_pinyin":{
                    "type": "pinyin",
                    "keep_first_letter": false,
                    "keep_full_pinyin": false,
                    "keep_joined_full_pinyin": true,
                    "keep_none_chinese_in_joined_full_pinyin": true,
                    "keep_none_chinese_in_first_letter": true,
                    "none_chinese_pinyin_tokenize": false,
                    "lowercase" : false
                }
            }
        }
    }
}

其中ngram_analyzerik_analyzermy_keyword为已有配置,新增pinyin_analyzer

2.2.2 Define a document

###
POST  {{host}}/bar/_doc/_mapping
Content-Type: application/json

{
    "properties":{
        "content":{
            "type":"text",
            "analyzer":"ik_analyzer",
            "search_analyzer":"ik_max_word",
            "fields":{
                "keyword":{
                    "ignore_above":256,
                    "type":"keyword"
                },
                "pinyin":{
                    "type":"text",
                    "fielddata":true,
                    "analyzer":"pinyin_analyzer",
                    "boost":100
                }
            }
        },
        "content1":{
            "type":"text",
            "analyzer":"ngram_analyzer",
            "search_analyzer":"my_keyword",
            "fields":{
                "keyword":{
                    "ignore_above":256,
                    "type":"keyword"
                },
                "pinyin":{
                    "type":"text",
                    "fielddata":true,
                    "analyzer":"pinyin_analyzer",
                    "boost":100
                }
            }
        },
        "content2":{
            "type":"nested",
            "properties":{
                "fooNested":{
                    "type":"text",
                    "analyzer":"ngram_analyzer",
                    "search_analyzer":"my_keyword",
                    "fields":{
                        "keyword":{
                            "ignore_above":256,
                            "type":"keyword"
                        },
                        "pinyin":{
                            "type":"text",
                            "fielddata":true,
                            "analyzer":"pinyin_analyzer",
                            "boost":100
                        }
                    }
                }
            }
        }
    }
}

由于字段已经定义analyzer,所以需要fields来对字段进行不同方式的分析2

2.2.3 Runing test data

测试数据,包含数字、大小写英文,中文,中文+英文等

###curl -X POST "localhost:9200/bar/_bulk" -H 'Content-Type: application/json' --data-binary @request
POST  {{host}}/_bulk
Content-Type: application/json

{"index": {"_index": "bar", "_type": "_doc", "_id": 1}}
{"content":"一级主办员其他","content1":"一级主办员其他","content2":{"fooNested":"一级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 2}
{"content":"二级主办员其他","content1":"二级主办员其他","content2":{"fooNested":"二级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 3}}
{"content":"1级主办员其他","content1":"1级主办员其他","content2":{"fooNested":"1级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 4}}
{"content":"2级主办员其他","content1":"2级主办员其他","content2":{"fooNested":"2级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 5}}
{"content":"A级主办员其他","content1":"A级主办员其他","content2":{"fooNested":"A级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 6}}
{"content":"B级主办员其他","content1":"B级主办员其他","content2":{"fooNested":"B级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 7}}
{"content":"a级主办员其他","content1":"a级主办员其他","content2":{"fooNested":"a级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 8}}
{"content":"b级主办员其他","content1":"b级主办员其他","content2":{"fooNested":"b级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 9}}
{"content":"aa级主办员其他","content1":"aa级主办员其他","content2":{"fooNested":"aa级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 10}}
{"content":"百级主办员其他","content1":"百级主办员其他","content2":{"fooNested":"百级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 11}}
{"content":"级主","content1":"级主","content2":{"fooNested":"级主"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 12}}
{"content":"ad级主办员其他","content1":"ad级主办员其他","content2":{"fooNested":"ad级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 13}}
{"content":"ac级主办员其他","content1":"ac级主办员其他","content2":{"fooNested":"ac级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 14}}
{"content":"主办","content1":"主办","content2":{"fooNested":"主办"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 15}}
{"content":"办员","content1":"办员","content2":{"fooNested":"办员"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 16}}
{"content":"中国人民政府zhonguoshanghai13154123中国人民政府","content1":"中国人民政府zhonguoshanghai13154123中国人民政府","content2":{"fooNested":"中国人民政府zhonguoshanghai13154123中国人民政府"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 17}}
{"content":"中国人民政府zhonguoshanghai13154123","content1":"中国人民政府zhonguoshanghai13154123","content2":{"fooNested":"中国人民政府zhonguoshanghai13154123"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 18}}
{"content":"king中国","content1":"king中国","content2":{"fooNested":"king中国"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 19}}
{"content":"经理J0001A01","content1":"经理J0001A01","content2":{"fooNested":"经理J0001A01"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 20}}
{"content":"foo-9","content1":"foo-9","content2":{"fooNested":"foo-9"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 21}}
{"content":"经理+J0001A01","content1":"经理+J0001A01","content2":{"fooNested":"经理+J0001A01"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 22}}
{"content":"ExternalId*+_38","content1":"ExternalId*+_38","content2":{"fooNested":"ExternalId*+_38"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 23}}
{"content":"ch_1008","content1":"ch_1008","content2":{"fooNested":"ch_1008"}}

2.2.4 Sort by fields3

  • Request
  ###
  GET  {{host}}/bar/_search
  Content-Type: application/json

  {
      "query":{
          "match_all":{

          }
      },
      "from":0,
      "size":23,
      "sort":{
          "content1.pinyin":{
              "order":"asc"
          }
      }
  }
  • Response
  {"took":5,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":23,"max_score":null,"hits":[{"_index":"bar","_type":"_doc","_id":"3","_score":null,"_source":{"content":"1级主办员其他","content1":"1级主办员其他","content2":{"fooNested":"1级主办员其他"}},"sort":["1级主办员其他","1jizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"4","_score":null,"_source":{"content":"2级主办员其他","content1":"2级主办员其他","content2":{"fooNested":"2级主办员其他"}},"sort":["2级主办员其他","2jizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"5","_score":null,"_source":{"content":"A级主办员其他","content1":"A级主办员其他","content2":{"fooNested":"A级主办员其他"}},"sort":["A级主办员其他","Ajizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"6","_score":null,"_source":{"content":"B级主办员其他","content1":"B级主办员其他","content2":{"fooNested":"B级主办员其他"}},"sort":["B级主办员其他","Bjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"22","_score":null,"_source":{"content":"ExternalId*+_38","content1":"ExternalId*+_38","content2":{"fooNested":"ExternalId*+_38"}},"sort":["ExternalId*+_38","ExternalId38"]},{"_index":"bar","_type":"_doc","_id":"9","_score":null,"_source":{"content":"aa级主办员其他","content1":"aa级主办员其他","content2":{"fooNested":"aa级主办员其他"}},"sort":["aa级主办员其他","aajizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"13","_score":null,"_source":{"content":"ac级主办员其他","content1":"ac级主办员其他","content2":{"fooNested":"ac级主办员其他"}},"sort":["ac级主办员其他","acjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"12","_score":null,"_source":{"content":"ad级主办员其他","content1":"ad级主办员其他","content2":{"fooNested":"ad级主办员其他"}},"sort":["ad级主办员其他","adjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"7","_score":null,"_source":{"content":"a级主办员其他","content1":"a级主办员其他","content2":{"fooNested":"a级主办员其他"}},"sort":["a级主办员其他","ajizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"8","_score":null,"_source":{"content":"b级主办员其他","content1":"b级主办员其他","content2":{"fooNested":"b级主办员其他"}},"sort":["b级主办员其他","bjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"23","_score":null,"_source":{"content":"ch_1008","content1":"ch_1008","content2":{"fooNested":"ch_1008"}},"sort":["ch_1008","ch1008"]},{"_index":"bar","_type":"_doc","_id":"20","_score":null,"_source":{"content":"foo-9","content1":"foo-9","content2":{"fooNested":"foo-9"}},"sort":["foo-9","foo9"]},{"_index":"bar","_type":"_doc","_id":"18","_score":null,"_source":{"content":"king中国","content1":"king中国","content2":{"fooNested":"king中国"}},"sort":["king中国","kingzhongguo"]},{"_index":"bar","_type":"_doc","_id":"1","_score":null,"_source":{"content":"一级主办员其他","content1":"一级主办员其他","content2":{"fooNested":"一级主办员其他"}},"sort":["一级主办员其他","yijizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"17","_score":null,"_source":{"content":"中国人民政府zhonguoshanghai13154123","content1":"中国人民政府zhonguoshanghai13154123","content2":{"fooNested":"中国人民政府zhonguoshanghai13154123"}},"sort":["中国人民政府zhonguoshanghai13154123","zhongguorenminzhengfuzhonguoshanghai13154123"]},{"_index":"bar","_type":"_doc","_id":"16","_score":null,"_source":{"content":"中国人民政府zhonguoshanghai13154123中国人民政府","content1":"中国人民政府zhonguoshanghai13154123中国人民政府","content2":{"fooNested":"中国人民政府zhonguoshanghai13154123中国人民政府"}},"sort":["中国人民政府zhonguoshanghai13154123中国人民政府","zhongguorenminzhengfuzhonguoshanghai13154123zhongguorenminzhengfu"]},{"_index":"bar","_type":"_doc","_id":"14","_score":null,"_source":{"content":"主办","content1":"主办","content2":{"fooNested":"主办"}},"sort":["主办","zhuban"]},{"_index":"bar","_type":"_doc","_id":"2","_score":null,"_source":{"content":"二级主办员其他","content1":"二级主办员其他","content2":{"fooNested":"二级主办员其他"}},"sort":["二级主办员其他","erjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"15","_score":null,"_source":{"content":"办员","content1":"办员","content2":{"fooNested":"办员"}},"sort":["办员","banyuan"]},{"_index":"bar","_type":"_doc","_id":"10","_score":null,"_source":{"content":"百级主办员其他","content1":"百级主办员其他","content2":{"fooNested":"百级主办员其他"}},"sort":["百级主办员其他","baijizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"11","_score":null,"_source":{"content":"级主","content1":"级主","content2":{"fooNested":"级主"}},"sort":["级主","jizhu"]},{"_index":"bar","_type":"_doc","_id":"21","_score":null,"_source":{"content":"经理+J0001A01","content1":"经理+J0001A01","content2":{"fooNested":"经理+J0001A01"}},"sort":["经理+J0001A01","jingliJ0001A01"]},{"_index":"bar","_type":"_doc","_id":"19","_score":null,"_source":{"content":"经理J0001A01","content1":"经理J0001A01","content2":{"fooNested":"经理J0001A01"}},"sort":["经理J0001A01","jingliJ0001A01"]}]}}

2.2.5 Nested sort3

Nested结构的排序

  • Request
  ###
  GET  {{host}}/bar/_search
  Content-Type: application/json

  {
      "query":{
          "match_all":{

          }
      },
      "from":0,
      "size":23,
      "sort":{
          "content2.fooNested.pinyin":{
              "nested":{
                  "path":"content2"
              },
              "order":"desc"
          }
      }
  }
  • Response
  {"took":11,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":23,"max_score":null,"hits":[{"_index":"bar","_type":"_doc","_id":"19","_score":null,"_source":{"content":"经理J0001A01","content1":"经理J0001A01","content2":{"fooNested":"经理J0001A01"}},"sort":["经理J0001A01","jingliJ0001A01"]},{"_index":"bar","_type":"_doc","_id":"21","_score":null,"_source":{"content":"经理+J0001A01","content1":"经理+J0001A01","content2":{"fooNested":"经理+J0001A01"}},"sort":["经理+J0001A01","jingliJ0001A01"]},{"_index":"bar","_type":"_doc","_id":"11","_score":null,"_source":{"content":"级主","content1":"级主","content2":{"fooNested":"级主"}},"sort":["级主","jizhu"]},{"_index":"bar","_type":"_doc","_id":"10","_score":null,"_source":{"content":"百级主办员其他","content1":"百级主办员其他","content2":{"fooNested":"百级主办员其他"}},"sort":["百级主办员其他","baijizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"15","_score":null,"_source":{"content":"办员","content1":"办员","content2":{"fooNested":"办员"}},"sort":["办员","banyuan"]},{"_index":"bar","_type":"_doc","_id":"2","_score":null,"_source":{"content":"二级主办员其他","content1":"二级主办员其他","content2":{"fooNested":"二级主办员其他"}},"sort":["二级主办员其他","erjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"14","_score":null,"_source":{"content":"主办","content1":"主办","content2":{"fooNested":"主办"}},"sort":["主办","zhuban"]},{"_index":"bar","_type":"_doc","_id":"16","_score":null,"_source":{"content":"中国人民政府zhonguoshanghai13154123中国人民政府","content1":"中国人民政府zhonguoshanghai13154123中国人民政府","content2":{"fooNested":"中国人民政府zhonguoshanghai13154123中国人民政府"}},"sort":["中国人民政府zhonguoshanghai13154123中国人民政府","zhongguorenminzhengfuzhonguoshanghai13154123zhongguorenminzhengfu"]},{"_index":"bar","_type":"_doc","_id":"17","_score":null,"_source":{"content":"中国人民政府zhonguoshanghai13154123","content1":"中国人民政府zhonguoshanghai13154123","content2":{"fooNested":"中国人民政府zhonguoshanghai13154123"}},"sort":["中国人民政府zhonguoshanghai13154123","zhongguorenminzhengfuzhonguoshanghai13154123"]},{"_index":"bar","_type":"_doc","_id":"1","_score":null,"_source":{"content":"一级主办员其他","content1":"一级主办员其他","content2":{"fooNested":"一级主办员其他"}},"sort":["一级主办员其他","yijizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"18","_score":null,"_source":{"content":"king中国","content1":"king中国","content2":{"fooNested":"king中国"}},"sort":["king中国","kingzhongguo"]},{"_index":"bar","_type":"_doc","_id":"20","_score":null,"_source":{"content":"foo-9","content1":"foo-9","content2":{"fooNested":"foo-9"}},"sort":["foo-9","foo9"]},{"_index":"bar","_type":"_doc","_id":"23","_score":null,"_source":{"content":"ch_1008","content1":"ch_1008","content2":{"fooNested":"ch_1008"}},"sort":["ch_1008","ch1008"]},{"_index":"bar","_type":"_doc","_id":"8","_score":null,"_source":{"content":"b级主办员其他","content1":"b级主办员其他","content2":{"fooNested":"b级主办员其他"}},"sort":["b级主办员其他","bjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"7","_score":null,"_source":{"content":"a级主办员其他","content1":"a级主办员其他","content2":{"fooNested":"a级主办员其他"}},"sort":["a级主办员其他","ajizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"12","_score":null,"_source":{"content":"ad级主办员其他","content1":"ad级主办员其他","content2":{"fooNested":"ad级主办员其他"}},"sort":["ad级主办员其他","adjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"13","_score":null,"_source":{"content":"ac级主办员其他","content1":"ac级主办员其他","content2":{"fooNested":"ac级主办员其他"}},"sort":["ac级主办员其他","acjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"9","_score":null,"_source":{"content":"aa级主办员其他","content1":"aa级主办员其他","content2":{"fooNested":"aa级主办员其他"}},"sort":["aa级主办员其他","aajizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"22","_score":null,"_source":{"content":"ExternalId*+_38","content1":"ExternalId*+_38","content2":{"fooNested":"ExternalId*+_38"}},"sort":["ExternalId*+_38","ExternalId38"]},{"_index":"bar","_type":"_doc","_id":"6","_score":null,"_source":{"content":"B级主办员其他","content1":"B级主办员其他","content2":{"fooNested":"B级主办员其他"}},"sort":["B级主办员其他","Bjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"5","_score":null,"_source":{"content":"A级主办员其他","content1":"A级主办员其他","content2":{"fooNested":"A级主办员其他"}},"sort":["A级主办员其他","Ajizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"4","_score":null,"_source":{"content":"2级主办员其他","content1":"2级主办员其他","content2":{"fooNested":"2级主办员其他"}},"sort":["2级主办员其他","2jizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"3","_score":null,"_source":{"content":"1级主办员其他","content1":"1级主办员其他","content2":{"fooNested":"1级主办员其他"}},"sort":["1级主办员其他","1jizhubanyuanqita"]}]}}

3参考

该文档基于ElastciSearch版本为v6.8

类似文章

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注