ElasticSearch中文排序问题
1需求
中文需要根据首字母进行排序比较,而ElasticSearch的排序规则是根据ASCII码,所以从人的感知上是没有任何的顺序规则的。
另外由于部分字段使用的analyzer使用ngram_analyzer或者ik_analyzer,在满足以上analyzer的同时,则需要使用ElasticSearch中index mapping字段的fields配置,用来不同的方式进行analyzer。
2解决方法
2.1 Quickstart elasticsearch-analysis-pinyin
2.1.1 Install elasticsearhc-analysis-pinyin
Docker下运行ElasticSearch的时候,进行安装
- Run a command in a running container
docker exec -it docker-elasticsearch bash
corehr-elasticsearch为container name
- Install
elasticsearch-analysis-pinyinbyelasticsearch-plugin
elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v6.8.10/elasticsearch-analysis-pinyin-6.8.10.zip
- Restart container
docker restart docker-elasticsearch
2.1.2 Test Anaylyzer,analyzing a chinese name,such as 经理+J0001A01
- Request
###
POST {{host}}/_analyze
Content-Type: application/json
{
"tokenizer":{
"type": "pinyin",
"keep_first_letter": false,
"keep_full_pinyin": false,
"keep_joined_full_pinyin": true,
"keep_none_chinese_in_joined_full_pinyin": true,
"keep_none_chinese_in_first_letter": true,
"none_chinese_pinyin_tokenize": false,
"lowercase" : false
},
"text":"经理+J0001A01"
}
Optional Parameters可以查看参考1,上述举例的配置为了得到一个token,便于排序,而不是为了进行搜索。
- Response
{
"tokens":[
{
"token":"jingliJ0001A01",
"start_offset":0,
"end_offset":0,
"type":"word",
"position":0
}
]
}
- Request
###
POST {{host}}/_analyze
Content-Type: application/json
{
"tokenizer":{
"type" : "pinyin",
"keep_first_letter":true,
"keep_separate_first_letter" : false,
"keep_full_pinyin" : false,
"limit_first_letter_length" : 20,
"lowercase" : false,
"keep_none_chinese":false
},
"text":"中华人民共和国"
}
- Response
{
"tokens":[
{
"token":"zhrmghg",
"start_offset":0,
"end_offset":0,
"type":"word",
"position":0
}
]
}
2.2 Create index,using elasticsearch-analyzer-pinyin
2.2.1 Add a new index
Define yourself analyzer by settings, for example ngram_analyzer,ik_analyzer,pinyin_analyzer and so on;
###
PUT {{host}}/bar
Content-Type: application/json
{
"settings":{
"analysis":{
"analyzer":{
"ngram_analyzer":{
"tokenizer":"ngram_tokenizer",
"filter":[
"lowercase"
]
},
"ik_analyzer":{
"tokenizer":"ik_max_word",
"filter":[
"lowercase"
]
},
"my_keyword":{
"tokenizer":"keyword",
"filter":["lowercase"]
},
"pinyin_analyzer":{
"tokenizer": "my_pinyin"
}
},
"tokenizer":{
"ngram_tokenizer":{
"type":"ngram",
"min_gram":1,
"max_gram":50
},
"my_pinyin":{
"type": "pinyin",
"keep_first_letter": false,
"keep_full_pinyin": false,
"keep_joined_full_pinyin": true,
"keep_none_chinese_in_joined_full_pinyin": true,
"keep_none_chinese_in_first_letter": true,
"none_chinese_pinyin_tokenize": false,
"lowercase" : false
}
}
}
}
}
其中
ngram_analyzer,ik_analyzer和my_keyword为已有配置,新增pinyin_analyzer
2.2.2 Define a document
###
POST {{host}}/bar/_doc/_mapping
Content-Type: application/json
{
"properties":{
"content":{
"type":"text",
"analyzer":"ik_analyzer",
"search_analyzer":"ik_max_word",
"fields":{
"keyword":{
"ignore_above":256,
"type":"keyword"
},
"pinyin":{
"type":"text",
"fielddata":true,
"analyzer":"pinyin_analyzer",
"boost":100
}
}
},
"content1":{
"type":"text",
"analyzer":"ngram_analyzer",
"search_analyzer":"my_keyword",
"fields":{
"keyword":{
"ignore_above":256,
"type":"keyword"
},
"pinyin":{
"type":"text",
"fielddata":true,
"analyzer":"pinyin_analyzer",
"boost":100
}
}
},
"content2":{
"type":"nested",
"properties":{
"fooNested":{
"type":"text",
"analyzer":"ngram_analyzer",
"search_analyzer":"my_keyword",
"fields":{
"keyword":{
"ignore_above":256,
"type":"keyword"
},
"pinyin":{
"type":"text",
"fielddata":true,
"analyzer":"pinyin_analyzer",
"boost":100
}
}
}
}
}
}
}
由于字段已经定义analyzer,所以需要fields来对字段进行不同方式的分析2
2.2.3 Runing test data
测试数据,包含数字、大小写英文,中文,中文+英文等
###curl -X POST "localhost:9200/bar/_bulk" -H 'Content-Type: application/json' --data-binary @request
POST {{host}}/_bulk
Content-Type: application/json
{"index": {"_index": "bar", "_type": "_doc", "_id": 1}}
{"content":"一级主办员其他","content1":"一级主办员其他","content2":{"fooNested":"一级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 2}
{"content":"二级主办员其他","content1":"二级主办员其他","content2":{"fooNested":"二级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 3}}
{"content":"1级主办员其他","content1":"1级主办员其他","content2":{"fooNested":"1级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 4}}
{"content":"2级主办员其他","content1":"2级主办员其他","content2":{"fooNested":"2级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 5}}
{"content":"A级主办员其他","content1":"A级主办员其他","content2":{"fooNested":"A级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 6}}
{"content":"B级主办员其他","content1":"B级主办员其他","content2":{"fooNested":"B级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 7}}
{"content":"a级主办员其他","content1":"a级主办员其他","content2":{"fooNested":"a级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 8}}
{"content":"b级主办员其他","content1":"b级主办员其他","content2":{"fooNested":"b级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 9}}
{"content":"aa级主办员其他","content1":"aa级主办员其他","content2":{"fooNested":"aa级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 10}}
{"content":"百级主办员其他","content1":"百级主办员其他","content2":{"fooNested":"百级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 11}}
{"content":"级主","content1":"级主","content2":{"fooNested":"级主"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 12}}
{"content":"ad级主办员其他","content1":"ad级主办员其他","content2":{"fooNested":"ad级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 13}}
{"content":"ac级主办员其他","content1":"ac级主办员其他","content2":{"fooNested":"ac级主办员其他"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 14}}
{"content":"主办","content1":"主办","content2":{"fooNested":"主办"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 15}}
{"content":"办员","content1":"办员","content2":{"fooNested":"办员"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 16}}
{"content":"中国人民政府zhonguoshanghai13154123中国人民政府","content1":"中国人民政府zhonguoshanghai13154123中国人民政府","content2":{"fooNested":"中国人民政府zhonguoshanghai13154123中国人民政府"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 17}}
{"content":"中国人民政府zhonguoshanghai13154123","content1":"中国人民政府zhonguoshanghai13154123","content2":{"fooNested":"中国人民政府zhonguoshanghai13154123"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 18}}
{"content":"king中国","content1":"king中国","content2":{"fooNested":"king中国"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 19}}
{"content":"经理J0001A01","content1":"经理J0001A01","content2":{"fooNested":"经理J0001A01"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 20}}
{"content":"foo-9","content1":"foo-9","content2":{"fooNested":"foo-9"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 21}}
{"content":"经理+J0001A01","content1":"经理+J0001A01","content2":{"fooNested":"经理+J0001A01"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 22}}
{"content":"ExternalId*+_38","content1":"ExternalId*+_38","content2":{"fooNested":"ExternalId*+_38"}}
{"index": {"_index": "bar", "_type": "_doc", "_id": 23}}
{"content":"ch_1008","content1":"ch_1008","content2":{"fooNested":"ch_1008"}}
2.2.4 Sort by fields3
- Request
###
GET {{host}}/bar/_search
Content-Type: application/json
{
"query":{
"match_all":{
}
},
"from":0,
"size":23,
"sort":{
"content1.pinyin":{
"order":"asc"
}
}
}
- Response
{"took":5,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":23,"max_score":null,"hits":[{"_index":"bar","_type":"_doc","_id":"3","_score":null,"_source":{"content":"1级主办员其他","content1":"1级主办员其他","content2":{"fooNested":"1级主办员其他"}},"sort":["1级主办员其他","1jizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"4","_score":null,"_source":{"content":"2级主办员其他","content1":"2级主办员其他","content2":{"fooNested":"2级主办员其他"}},"sort":["2级主办员其他","2jizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"5","_score":null,"_source":{"content":"A级主办员其他","content1":"A级主办员其他","content2":{"fooNested":"A级主办员其他"}},"sort":["A级主办员其他","Ajizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"6","_score":null,"_source":{"content":"B级主办员其他","content1":"B级主办员其他","content2":{"fooNested":"B级主办员其他"}},"sort":["B级主办员其他","Bjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"22","_score":null,"_source":{"content":"ExternalId*+_38","content1":"ExternalId*+_38","content2":{"fooNested":"ExternalId*+_38"}},"sort":["ExternalId*+_38","ExternalId38"]},{"_index":"bar","_type":"_doc","_id":"9","_score":null,"_source":{"content":"aa级主办员其他","content1":"aa级主办员其他","content2":{"fooNested":"aa级主办员其他"}},"sort":["aa级主办员其他","aajizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"13","_score":null,"_source":{"content":"ac级主办员其他","content1":"ac级主办员其他","content2":{"fooNested":"ac级主办员其他"}},"sort":["ac级主办员其他","acjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"12","_score":null,"_source":{"content":"ad级主办员其他","content1":"ad级主办员其他","content2":{"fooNested":"ad级主办员其他"}},"sort":["ad级主办员其他","adjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"7","_score":null,"_source":{"content":"a级主办员其他","content1":"a级主办员其他","content2":{"fooNested":"a级主办员其他"}},"sort":["a级主办员其他","ajizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"8","_score":null,"_source":{"content":"b级主办员其他","content1":"b级主办员其他","content2":{"fooNested":"b级主办员其他"}},"sort":["b级主办员其他","bjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"23","_score":null,"_source":{"content":"ch_1008","content1":"ch_1008","content2":{"fooNested":"ch_1008"}},"sort":["ch_1008","ch1008"]},{"_index":"bar","_type":"_doc","_id":"20","_score":null,"_source":{"content":"foo-9","content1":"foo-9","content2":{"fooNested":"foo-9"}},"sort":["foo-9","foo9"]},{"_index":"bar","_type":"_doc","_id":"18","_score":null,"_source":{"content":"king中国","content1":"king中国","content2":{"fooNested":"king中国"}},"sort":["king中国","kingzhongguo"]},{"_index":"bar","_type":"_doc","_id":"1","_score":null,"_source":{"content":"一级主办员其他","content1":"一级主办员其他","content2":{"fooNested":"一级主办员其他"}},"sort":["一级主办员其他","yijizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"17","_score":null,"_source":{"content":"中国人民政府zhonguoshanghai13154123","content1":"中国人民政府zhonguoshanghai13154123","content2":{"fooNested":"中国人民政府zhonguoshanghai13154123"}},"sort":["中国人民政府zhonguoshanghai13154123","zhongguorenminzhengfuzhonguoshanghai13154123"]},{"_index":"bar","_type":"_doc","_id":"16","_score":null,"_source":{"content":"中国人民政府zhonguoshanghai13154123中国人民政府","content1":"中国人民政府zhonguoshanghai13154123中国人民政府","content2":{"fooNested":"中国人民政府zhonguoshanghai13154123中国人民政府"}},"sort":["中国人民政府zhonguoshanghai13154123中国人民政府","zhongguorenminzhengfuzhonguoshanghai13154123zhongguorenminzhengfu"]},{"_index":"bar","_type":"_doc","_id":"14","_score":null,"_source":{"content":"主办","content1":"主办","content2":{"fooNested":"主办"}},"sort":["主办","zhuban"]},{"_index":"bar","_type":"_doc","_id":"2","_score":null,"_source":{"content":"二级主办员其他","content1":"二级主办员其他","content2":{"fooNested":"二级主办员其他"}},"sort":["二级主办员其他","erjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"15","_score":null,"_source":{"content":"办员","content1":"办员","content2":{"fooNested":"办员"}},"sort":["办员","banyuan"]},{"_index":"bar","_type":"_doc","_id":"10","_score":null,"_source":{"content":"百级主办员其他","content1":"百级主办员其他","content2":{"fooNested":"百级主办员其他"}},"sort":["百级主办员其他","baijizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"11","_score":null,"_source":{"content":"级主","content1":"级主","content2":{"fooNested":"级主"}},"sort":["级主","jizhu"]},{"_index":"bar","_type":"_doc","_id":"21","_score":null,"_source":{"content":"经理+J0001A01","content1":"经理+J0001A01","content2":{"fooNested":"经理+J0001A01"}},"sort":["经理+J0001A01","jingliJ0001A01"]},{"_index":"bar","_type":"_doc","_id":"19","_score":null,"_source":{"content":"经理J0001A01","content1":"经理J0001A01","content2":{"fooNested":"经理J0001A01"}},"sort":["经理J0001A01","jingliJ0001A01"]}]}}
2.2.5 Nested sort3
Nested结构的排序
- Request
###
GET {{host}}/bar/_search
Content-Type: application/json
{
"query":{
"match_all":{
}
},
"from":0,
"size":23,
"sort":{
"content2.fooNested.pinyin":{
"nested":{
"path":"content2"
},
"order":"desc"
}
}
}
- Response
{"took":11,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":23,"max_score":null,"hits":[{"_index":"bar","_type":"_doc","_id":"19","_score":null,"_source":{"content":"经理J0001A01","content1":"经理J0001A01","content2":{"fooNested":"经理J0001A01"}},"sort":["经理J0001A01","jingliJ0001A01"]},{"_index":"bar","_type":"_doc","_id":"21","_score":null,"_source":{"content":"经理+J0001A01","content1":"经理+J0001A01","content2":{"fooNested":"经理+J0001A01"}},"sort":["经理+J0001A01","jingliJ0001A01"]},{"_index":"bar","_type":"_doc","_id":"11","_score":null,"_source":{"content":"级主","content1":"级主","content2":{"fooNested":"级主"}},"sort":["级主","jizhu"]},{"_index":"bar","_type":"_doc","_id":"10","_score":null,"_source":{"content":"百级主办员其他","content1":"百级主办员其他","content2":{"fooNested":"百级主办员其他"}},"sort":["百级主办员其他","baijizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"15","_score":null,"_source":{"content":"办员","content1":"办员","content2":{"fooNested":"办员"}},"sort":["办员","banyuan"]},{"_index":"bar","_type":"_doc","_id":"2","_score":null,"_source":{"content":"二级主办员其他","content1":"二级主办员其他","content2":{"fooNested":"二级主办员其他"}},"sort":["二级主办员其他","erjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"14","_score":null,"_source":{"content":"主办","content1":"主办","content2":{"fooNested":"主办"}},"sort":["主办","zhuban"]},{"_index":"bar","_type":"_doc","_id":"16","_score":null,"_source":{"content":"中国人民政府zhonguoshanghai13154123中国人民政府","content1":"中国人民政府zhonguoshanghai13154123中国人民政府","content2":{"fooNested":"中国人民政府zhonguoshanghai13154123中国人民政府"}},"sort":["中国人民政府zhonguoshanghai13154123中国人民政府","zhongguorenminzhengfuzhonguoshanghai13154123zhongguorenminzhengfu"]},{"_index":"bar","_type":"_doc","_id":"17","_score":null,"_source":{"content":"中国人民政府zhonguoshanghai13154123","content1":"中国人民政府zhonguoshanghai13154123","content2":{"fooNested":"中国人民政府zhonguoshanghai13154123"}},"sort":["中国人民政府zhonguoshanghai13154123","zhongguorenminzhengfuzhonguoshanghai13154123"]},{"_index":"bar","_type":"_doc","_id":"1","_score":null,"_source":{"content":"一级主办员其他","content1":"一级主办员其他","content2":{"fooNested":"一级主办员其他"}},"sort":["一级主办员其他","yijizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"18","_score":null,"_source":{"content":"king中国","content1":"king中国","content2":{"fooNested":"king中国"}},"sort":["king中国","kingzhongguo"]},{"_index":"bar","_type":"_doc","_id":"20","_score":null,"_source":{"content":"foo-9","content1":"foo-9","content2":{"fooNested":"foo-9"}},"sort":["foo-9","foo9"]},{"_index":"bar","_type":"_doc","_id":"23","_score":null,"_source":{"content":"ch_1008","content1":"ch_1008","content2":{"fooNested":"ch_1008"}},"sort":["ch_1008","ch1008"]},{"_index":"bar","_type":"_doc","_id":"8","_score":null,"_source":{"content":"b级主办员其他","content1":"b级主办员其他","content2":{"fooNested":"b级主办员其他"}},"sort":["b级主办员其他","bjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"7","_score":null,"_source":{"content":"a级主办员其他","content1":"a级主办员其他","content2":{"fooNested":"a级主办员其他"}},"sort":["a级主办员其他","ajizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"12","_score":null,"_source":{"content":"ad级主办员其他","content1":"ad级主办员其他","content2":{"fooNested":"ad级主办员其他"}},"sort":["ad级主办员其他","adjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"13","_score":null,"_source":{"content":"ac级主办员其他","content1":"ac级主办员其他","content2":{"fooNested":"ac级主办员其他"}},"sort":["ac级主办员其他","acjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"9","_score":null,"_source":{"content":"aa级主办员其他","content1":"aa级主办员其他","content2":{"fooNested":"aa级主办员其他"}},"sort":["aa级主办员其他","aajizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"22","_score":null,"_source":{"content":"ExternalId*+_38","content1":"ExternalId*+_38","content2":{"fooNested":"ExternalId*+_38"}},"sort":["ExternalId*+_38","ExternalId38"]},{"_index":"bar","_type":"_doc","_id":"6","_score":null,"_source":{"content":"B级主办员其他","content1":"B级主办员其他","content2":{"fooNested":"B级主办员其他"}},"sort":["B级主办员其他","Bjizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"5","_score":null,"_source":{"content":"A级主办员其他","content1":"A级主办员其他","content2":{"fooNested":"A级主办员其他"}},"sort":["A级主办员其他","Ajizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"4","_score":null,"_source":{"content":"2级主办员其他","content1":"2级主办员其他","content2":{"fooNested":"2级主办员其他"}},"sort":["2级主办员其他","2jizhubanyuanqita"]},{"_index":"bar","_type":"_doc","_id":"3","_score":null,"_source":{"content":"1级主办员其他","content1":"1级主办员其他","content2":{"fooNested":"1级主办员其他"}},"sort":["1级主办员其他","1jizhubanyuanqita"]}]}}
3参考
该文档基于ElastciSearch版本为v6.8