elk01 Elasticsearch集群升级到5.4

故事背景

历史原因一直维护着部门2套ELK系统,一直想合并 最近一套系统的主机出现问题 总算给自己一个合并的理由,现有系统的版本是5.2 看到当前官方放出来的版本是5.4 工欲善其事 必先利其器 最新版的ELK你值得拥有

前期准备

我的是从es5.2 升级到es5.4 使用官网推荐的Rolling upgrades 升级方式:


查看升级教程 :
https://www.elastic.co/guide/en/elasticsearch/reference/current/rolling-upgrades.html

升级步骤

Disable shard allocation

When you shut down a node, the allocation process will wait for one minute before starting to replicate the shards that were on that node to other nodes in the cluster, causing a lot of wasted I/O. This can be avoided by disabling allocation before shutting down a node:

PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": "none"
  }
}

我是直接在 kibana的dev tools中操作的 看到acknowledged 后面是 true 便推断命令执行成功了.

Stop non-essential indexing and perform a synced flush (Optional)

You may happily continue indexing during the upgrade. However, shard recovery will be much faster if you temporarily stop non-essential indexing and issue a synced-flush request:

POST _flush/synced

A synced flush request is a “best effort” operation. It will fail if there are any pending indexing operations, but it is safe to reissue the request multiple times if necessary.

这步操作是可选的,我是操作了 官方文档上面那个 However 就够吓人的了。完全和 你正在吃着火锅唱着歌 突然就被麻匪劫了 一个意思。执行这个命令 等待反应的时间有点长,你要是不放心 多执行几次也是安全的。

Stop and upgrade a single node

Shut down one of the nodes in the cluster before starting the upgrade.

To upgrade using a Debian or RPM package:

  • Use rpm or dpkg to install the new package. All files should be placed in their proper locations, and config files should not be overwritten.

我的是centos7 系统 安装用的是rpm包 服务用的是systemctl管理,个人感觉这样维护起来方便。

先下载好elasticsearch-5.4.0.rpm

systemctl stop elasticsearch.service

[root@es ~]# rpm -Uvh elasticsearch-5.4.0.rpm 
warning: elasticsearch-5.4.0.rpm: Header V4 RSA/SHA512 Signature, key ID d88e42b4: NOKEY
Preparing...                          ################################# [100%]
Updating / installing...
   1:elasticsearch-0:5.4.0-1          warning: /etc/elasticsearch/jvm.options created as /etc
################################# [ 50%]
Cleaning up / removing...
   2:elasticsearch-0:5.2.0-1          ################################# [100%]

Upgrade any plugins

Elasticsearch plugins must be upgraded when upgrading a node. Use the elasticsearch-plugin script to install the correct version of any plugins that you need.

着一步 很重要 有些插件对版本的要求是比较严格的 你要是不升级 它就不给你用。

比如说 我遇到的search-guard

[root@es ~]# cd /usr/share/elasticsearch/
[root@es elasticsearch]# bin/elasticsearch-plugin list
search-guard-5
WARNING: plugin [search-guard-5] is incompatible with version [5.4.0]; was designed for version [5.2.0]
[root@xdl-18-57 elasticsearch]# bin/elasticsearch-plugin remove search-guard-5
-> removing [search-guard-5]...

https://github.com/floragunncom/search-guard#search-guard---security-for-elasticsearch

[root@es elasticsearch]# bin/elasticsearch-plugin install \
>   -b com.floragunn:search-guard-5:5.4.0-12
-> Downloading com.floragunn:search-guard-5:5.4.0-12 from maven central
[=================================================] 100%   
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@     WARNING: plugin requires additional permissions     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.io.FilePermission /proc/sys/net/core/somaxconn read
* java.lang.RuntimePermission accessClassInPackage.sun.misc
* java.lang.RuntimePermission accessClassInPackage.sun.nio.ch
* java.lang.RuntimePermission accessClassInPackage.sun.security.x509
* java.lang.RuntimePermission accessDeclaredMembers
* java.lang.RuntimePermission getClassLoader
* java.lang.RuntimePermission loadLibrary.*
* java.lang.RuntimePermission setContextClassLoader
* java.lang.RuntimePermission shutdownHooks
* java.lang.reflect.ReflectPermission suppressAccessChecks
* java.security.SecurityPermission getProperty.ssl.KeyManagerFactory.algorithm
* java.util.PropertyPermission java.security.debug write
* java.util.PropertyPermission java.security.krb5.conf write
* java.util.PropertyPermission javax.security.auth.useSubjectCredsOnly write
* java.util.PropertyPermission sun.nio.ch.bugLevel write
* java.util.PropertyPermission sun.security.krb5.debug write
* java.util.PropertyPermission sun.security.spnego.debug write
* javax.security.auth.AuthPermission doAs
* javax.security.auth.AuthPermission modifyPrivateCredentials
* javax.security.auth.kerberos.ServicePermission * accept
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.
-> Installed search-guard-5
[root@es elasticsearch]#

Start the upgraded node

Start the now upgraded node and confirm that it joins the cluster by checking the log file or by checking the output of this request:

这个步骤也很重要,一定要 checking the log file 启动的成功或者失败的详细信息都会在这里显示出来。

tail -f /var/log/elasticsearch/itlogcluster.log

systemctl start elasticsearch.service

等待启动一会后 可以看到升级的es node 加入了集群

GET _cat/nodes

Reenable shard allocation

Once the node has joined the cluster, reenable shard allocation to start using the node:

PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": "all"
  }
}

前面步骤没问题的话 这步没什么大问题

Wait for the node to recover

You should wait for the cluster to finish shard allocation before upgrading the next node. You can check on progress with the _cat/health request:

GET _cat/health

Wait for the status column to move from yellow to green. Status green means that all primary and replica shards have been allocated.

这步等待时间比较长,一定要等 relo init 为0 active_shards_percent为100%

下面有些需要注意的 要仔细读一下
– During a rolling upgrade, primary shards assigned to a node with the higher version will never have their replicas assigned to a node with the lower version, because the newer version may have a different data format which is not understood by the older version.

– If it is not possible to assign the replica shards to another node with the higher version — e.g. if there is only one node with the higher version in the cluster — then the replica shards will remain unassigned and the cluster health will remain status yellow.

– In this case, check that there are no initializing or relocating shards (the init and relo columns) before proceding.

– As soon as another node is upgraded, the replicas should be assigned and the cluster health will reach status green.

Shards that have not been sync-flushed may take some time to recover. The recovery status of individual shards can be monitored with the _cat/recovery request:

GET _cat/recovery

If you stopped indexing, then it is safe to resume indexing as soon as recovery has completed.

Repeat

When the cluster is stable and the node has recovered, repeat the above steps for all remaining nodes.

重复上面的步骤 升级其它集群的node

遇到问题

在升级第一个Node的时候 是挺顺利的。升级第二个node的时候 es服务启动不了 日志显示 很多 都是和 search-guard 有关系

后来修改了 2个文件的权限 问题解决。

[root@es2 tools]# chmod 644 /etc/elasticsearch/truststore.jks

[root@es2 tools]# chmod 644 /etc/elasticsearch/itnode-2-keystore.jks

遇到问题 大家一定要仔细看日志 根据日志提示 自己操作还是google都是很好的解决思路。

结束语

好的习惯 标准的配置和管理会让你后期升级维护很省心。

愿你的升级也一切顺利

{
  "name" : "itnode-1",
  "cluster_name" : "itlogcluster",
  "cluster_uuid" : "L0FhsEarTN--W-hzgM_nyA",
  "version" : {
    "number" : "5.4.0",
    "build_hash" : "780f8c4",
    "build_date" : "2017-04-28T17:43:27.229Z",
    "build_snapshot" : false,
    "lucene_version" : "6.5.0"
  },
  "tagline" : "You Know, for Search"
}