ElastiCache Redis Engineのベンチマークともろもろ

Amazon Web Services Blog: Amazon ElastiCache - Now With a Dash of Redis
で本日ElastiCacheでRedisが使えるようになりました。今まではmemcachedだけでした。
(同時にAmazon Web Services Blog: More Database Power - 20,000 IOPS for MySQL With the CR1 Instance も発表されています)

f:id:con_mame:20130905103345p:plain

基本的なこと

機能

Replication GroupsによるRedisレプリケーションのサポート
Replication Groupsは1ノード毎にSlaveとなる、Endpointは各ノード毎にアサインされる
- Masterへは統一のEndpoitが与えられ、Read replicaをmasterにpromoteしてもendpointの変更は必要ない
- slaveは各ノードへアクセスする必要がある
- slaveはread only mode
S3に配置したRDBファイルからノードを作成可能
- 現在オンプレ環境やEC2インスタンス上で使ってるRedisの中身をそのまま移行可能
- とても楽でした
現在のところRedis2.6.13
Maintenance Windowがある

system

redis_version:2.6.13
redis_git_sha1:00000000
redis_git_dirty:0
redis_mode:standalone
os:Amazon ElastiCache
arch_bits:64
multiplexing_api:epoll
gcc_version:0.0.0
process_id:1
run_id:5d523a3bd3f8a8ef0ce2dc617e861c3122a43f56
tcp_port:6379
uptime_in_seconds:3118
uptime_in_days:0
hz:10
lru_clock:1264120

RDBファイルのインポート

既存の環境から移行する場合は、RDBファイルをS3からインポートすることで簡単に環境を以降可能です。

f:id:con_mame:20130905172103p:plain

性能

性能は格納するデータサイズ・タイプなどによって変わるので使用前には必ずベンチマークをとって、必要な性能が出ているかや、必要なノード数を出しておきましょう。

以下の検証はとある本番環境を模擬したものです。

検証項目

ノード単体の基本性能

requests per second
100,000 keys / 1000 clients / 100,000,000 variables / 75 bytes

	cache.m1.small	cache.m1.medium	cache.m1.large	cache.m1.xlarge	cache.m2.xlarge	cache.m2.2xlarge	cache.m2.4xlarge	cache.c1.xlarge
SET	10738.83	17190.99	22537.6	16488.05	18964.54	19512.2	18839.49	19774.57
GET	11944.58	17208.74	22187.71	17975.91	20108.59	19394.88	18467.22	19413.71
INCR	10894.43	17364.13	20350.02	18406.04	20635.57	18740.63	18274.85	19735.54
LPUSH	11298.16	18978.93	22810.22	19383.6	20271.64	19782.39	18765.25	21413.28
LPOP	11189.44	17649.13	21682.57	19409.94	18907.17	19615.54	18484.29	20263.42
SADD	10986.6	17349.06	20275.75	18508.24	18712.58	18399.26	17674.09	19535.07
SPOP	10672.36	18093	21939.45	20020.02	19531.25	19040.37	18545.99	20316.95
LRANGE_100	4383.85	8798.17	8550.66	8817.56	8726.77	8597.71	8690.36	8901.55
LRANGE_300	1365.32	2988.11	2911.63	2253	3059.41	2957.88	2912.48	3095.02
LRANGE_500	747.76	1978.67	1995.41	2040.15	1965.56	1837.02	1725.63	1623.59
LRANGE_600	473.79	1524.67	1522.93	1591.98	1513.34	1482.27	1543.83	1254.25
MSET	5904.58	15857.91	11161.96	18005.04	16871.94	18175.21	18201.67	17649.13

f:id:con_mame:20130905103302p:plain

latency

milli seconds
10,000 samples

	cache.m1.small	cache.m1.medium	cache.m1.large	cache.m1.xlarge	cache.m2.xlarge	cache.m2.2xlarge	cache.m2.4xlarge	cache.c1.xlarge
latency min	0	0	0	0	0	0	0	0
latency max	38	23	16	11	5	7	15	8
latency avg	0.77	0.38	0.34	0.31	0.4	0.34	0.37	0.37

f:id:con_mame:20130905103413p:plain

インスタンスタイプによるlatencyの差異はほとんど無い

EC2インスタンス上に構築してあるRedis<->ELB間は0.02-0.08 milli seconds. EC2->ELBもそのくらいなので、EC2->ElastiCacheは少しlatency高め

Cache Replicaの追加

Masterに大量にデータが格納されている時にCache Replicaを作成した場合つまりが起こるか

cache.m1.small / 1GB

replica 1

GET: 12482.84 requests per second
GET: 10188.49 requests per second
GET: 11746.74 requests per second
GET: 14409.22 requests per second
GET: 6359.71 requests per second <- bgsave start
GET: 7206.17 requests per second
GET: 8850.34 requests per second
GET: 13363.62 requests per second <- online

replica 3

GET: 10001.00 requests per second <- bgsave start 1台め
GET: 9691.80 requests per second
GET: 9633.91 requests per second
GET: 10574.18 requests per second
GET: 12965.12 requests per second
GET: 11664.53 requests per second
GET: 6945.89 requests per second <- bgsave start 2台 / replication
GET: 8538.98 requests per second
GET: 9387.03 requests per second
GET: 10625.86 requests per second  <- online

bgsaveからtransfer完了してonline statusになるまで若干のつまりがあった。転送に関しては、一気にreplicaをぶら下げるとぶら下げた台数全てに全データを一気に転送するため並列度が高いと時間がかかったが数%の劣化といったところだった。これはmasterが持っているデータ量による。

cache.m2.4xlargeで10GBのデータのrdb_last_bgsave_time_secは約120sec。1台のreplicaへの転送は50-60sec以内 (1Gbps以上出てる)

Paramater Group / node type

インスタンススペック固有

CACHENODETYPESPECIFICPARAMETER  maxmemory  system  integer  false  2.6.13
      CACHENODETYPESPECIFICVALUE  cache.c1.xlarge   6501171200
      CACHENODETYPESPECIFICVALUE  cache.m1.large    7025459200
      CACHENODETYPESPECIFICVALUE  cache.m1.medium   3093299200
      CACHENODETYPESPECIFICVALUE  cache.m1.small    943718400
      CACHENODETYPESPECIFICVALUE  cache.m1.xlarge   14889779200
      CACHENODETYPESPECIFICVALUE  cache.m2.2xlarge  35022438400
      CACHENODETYPESPECIFICVALUE  cache.m2.4xlarge  70883737600
      CACHENODETYPESPECIFICVALUE  cache.m2.xlarge   17091788800
      CACHENODETYPESPECIFICVALUE  cache.t1.micro    142606336

CACHEPARAMETER  activerehashing                                 yes           system  string   true   2.6.13
CACHEPARAMETER  appendfsync                                     everysec      system  string   true   2.6.13
CACHEPARAMETER  appendonly                                      no            system  string   true   2.6.13
CACHEPARAMETER  client-output-buffer-limit-normal-hard-limit    0             system  integer  true   2.6.13
CACHEPARAMETER  client-output-buffer-limit-normal-soft-limit    0             system  integer  true   2.6.13
CACHEPARAMETER  client-output-buffer-limit-normal-soft-seconds  0             system  integer  true   2.6.13
CACHEPARAMETER  client-output-buffer-limit-pubsub-hard-limit    33554432      system  integer  true   2.6.13
CACHEPARAMETER  client-output-buffer-limit-pubsub-soft-limit    8388608       system  integer  true   2.6.13
CACHEPARAMETER  client-output-buffer-limit-pubsub-soft-seconds  60            system  integer  true   2.6.13
CACHEPARAMETER  client-output-buffer-limit-slave-hard-limit     268435456     system  integer  false  2.6.13
CACHEPARAMETER  client-output-buffer-limit-slave-soft-limit     67108864      system  integer  false  2.6.13
CACHEPARAMETER  client-output-buffer-limit-slave-soft-seconds   60            system  integer  false  2.6.13
CACHEPARAMETER  databases                                       16            system  integer  true   2.6.13
CACHEPARAMETER  hash-max-ziplist-entries                        512           system  integer  true   2.6.13
CACHEPARAMETER  hash-max-ziplist-value                          64            system  integer  true   2.6.13
CACHEPARAMETER  list-max-ziplist-entries                        512           system  integer  true   2.6.13
CACHEPARAMETER  list-max-ziplist-value                          64            system  integer  true   2.6.13
CACHEPARAMETER  lua-time-limit                                  5000          system  integer  false  2.6.13
CACHEPARAMETER  maxclients                                      65000         system  integer  false  2.6.13
CACHEPARAMETER  maxmemory-policy                                volatile-lru  system  string   true   2.6.13
CACHEPARAMETER  maxmemory-samples                               3             system  integer  true   2.6.13
CACHEPARAMETER  set-max-intset-entries                          512           system  integer  true   2.6.13
CACHEPARAMETER  slave-allow-chaining                            no            system  string   false  2.6.13
CACHEPARAMETER  slowlog-log-slower-than                         10000         system  integer  true   2.6.13
CACHEPARAMETER  slowlog-max-len                                 128           system  integer  true   2.6.13
CACHEPARAMETER  tcp-keepalive                                   0             system  integer  true   2.6.13
CACHEPARAMETER  timeout                                         0             system  integer  true   2.6.13
CACHEPARAMETER  zset-max-ziplist-entries                        128           system  integer  true   2.6.13
CACHEPARAMETER  zset-max-ziplist-value                          64            system  integer  true   2.6.13

masterノードに大量のデータが入っている場合、一気にslaveをぶら下げると高負荷環境ではredisのbuffer(client-output-buffer-limitのあたり)に少し気を使うとよいです。

使えないコマンド

rename-commandされてる
CONFIG SET / GET
BGSAVE / SAVE
SLAVEOF などなど制御系
LASTSAVEは使える

* Replication
role:slave
master_host:xxx.9xxxx.0001.apne1.cache.amazonaws.com
master_port:6379
master_link_status:down

ファイル書き出し

INFOは使えるが、CONFIG GETが使えないのでParameter Groupで設定出来る値以外は参照できない。SAVEが参照出来ないがSAVEのタイミングは調整されているのだろうか、AOFファイルの肥大化に対応(auto-aof-rewrite-min-sizeとか)できているかは調べ中。

書き出しタイミングは調査中。BGREWRITEAOFコマンドは使えない。

Replication Group

Replication Groups and Read Replicas - Amazon ElastiCache

RedisのMaster-Slave構成
MasterでINFOを除くと、172.x.x.xのネットワークに属してる事がわかる(slaveのIPアドレス)
SlaveではMasterはドメイン名で指定されている
ElastiCacheのRedisにEC2からつなげてSlave作ることも出来る (逆は無理)
Readonly modeになっている

削除時
- Read replica削除時はRead Replicaから削除していく
- masterノードを削除する前に、Replication Groupを削除する
- Replication Group消したら自動でMaster nodeが削除される (残せないのか。。)

Masterの様子

* Replication
role:master
connected_slaves:5
slave0:172.16.xxx.xxx,6379,online
slave1:172.16.xxx.xxx,6379,online
slave2:172.16.xxx.xxx,6379,online
slave3:172.16.xxx.xxx,6379,online
slave4:172.16.xxx.xxx,6379,online

Slaveの様子

# Replication
role:slave
master_host:xxxxx.xxx.0001.internal.apne1.cache.amazonaws.com   <- Replication Group名がprefix
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_priority:100
slave_read_only:1
connected_slaves:0

Replication Groupを作成すると、
f:id:con_mame:20130905122136p:plain
の様な感じで、AZを跨いで作成することも可能。また、共有のMaster nodeへのendopointが発行され、アプリケーションからはこちらにアクセスをすると、Masterが切り替わっても意識することなく接続が可能になる。
promoteを押下して、modifyが終わっても、2−3分はレスポンスが3秒近くかかる状態になる。この間はmaster statusはdown
Slaveノードへのアクセスは各ノード名を直接指定しないといけないのでバランシングしたほうがアプリケーションからslaveのdownを意識しないでよい。
各slaveはReplication Groupの内部用共通endpointを使用していて、こちらはユーザ側からはDNSで名前解決が出来ません。

masterノードへは

replication-group-hoge.xxxx.ng.0001.apne1.cache.amazonaws.com. 15 IN CNAME xxx.xxx.0001.apne1.cache.amazonaws.com.
xxx.xxx.0001.apne1.cache.amazonaws.com. 15 IN A xxx.xxx.xxx.xxx

と、CNAMEでmasterノードが切り替わるようになっています。その為、どのAWSサービスでもそうですが、こちらもIPアドレスを長時間キャッシュしておくと接続できなくなる事があります。

監視

f:id:con_mame:20130905122612p:plain
このような感じで、サーバリソースの他に、redisへ発行されたコマンドの種類の統計やreplicationの状態も監視されています。

replicationの状態は

> get "ElastiCacheMasterReplicationTimestamp"
"2013-09-05T01:25:01.001Z"

このような感じでmaster nodeでsetした時間を読み出してラグを見ているようです。

おまけ

Fork time in different systems

http://redis.io/topics/latency

Linux beefy VM on VMware 6.0GB RSS forked in 77 milliseconds (12.8 milliseconds per GB).
Linux running on physical machine (Unknown HW) 6.1GB RSS forked in 8
0 milliseconds (13.1 milliseconds per GB)
Linux running on physical machine (Xeon @ 2.27Ghz) 6.9GB RSS forked into 62 millisecodns (9 milliseconds per GB).
Linux VM on 6sync (KVM) 360 MB RSS forked in 8.2 milliseconds (23.3 millisecond per GB).
Linux VM on EC2 (Xen) 6.1GB RSS forked in 1460 milliseconds (239.3 milliseconds per GB).
Linux VM on Linode (Xen) 0.9GB RSS forked into 382 millisecodns (424 milliseconds per GB).

AWSでRedisを動かすのはforkがちょっとネック。ElastiCacheもxenの上で動いてると思われるので若干の影響は受けているはず。

感想

RDBファイルのインポートやMaster nodeへの単一endpointは嬉しいのですが、Slaveのバランスを行なうendpointやSAVEの間隔などユーザが変更出来るパラメータの追加などを期待しています！

SlaveのMulti-AZ配置も良かったです。

また、Maintenance Windowがあるので繋がらない時間があることもお忘れなく。

まめ畑

ゆるゆると書いていきます