Sap IT notes

Ceph. Manuall installation.



Here I try to describe step-by-step Ceph Storage Cluster (Hammer version, 0.94) installation procedure on CentOS 7 based server.

During the installation process I used documentation from:

I think manual Ceph instalation can help you get more knowledges about how to set it up and how it works but manual method is not so convinient for deploing few clusters in production environment.
There is an original utility wich allows you to automate deploing Ceph cluster – ceph-deploy. But manual installation prcedure can helps you design your own chief, pappet and etc recipes as well.

1. Setup repos and install packets.

2. Creating and setting CEPH cluster.

3. Adding and setting OSD.

So, according to manual do next steps.

Setup repos and install packets.

Install release.asc:

rpm --import ';a=blob_plain;f=keys/release.asc'


Add ceph repo:

name=Ceph packages for $basearch

name=Ceph noarch packages

name=Ceph source packages

***Change priority to priority=2, insert hammer instead of {ceph-release} and change {distro} to el7.


Install EPEL repo for our version of CentOS:

rpm -ivh epel-release-7-5.noarch.rpm

Install yum-plugin-priorities:

yum install -y yum-plugin-priorities

Install additional packets:

yum install snappy leveldb gdisk python-argparse gperftools-libs

And finally install ceph packeys:

yum install ceph

***Ceph likes accurate time very much, so you should set NTP on all ceph cluster servers.

On top


2. Creating and setting CEPH cluster.

MON (Monitor) nodes is a base ofa ceph cluster, let’s start with them.

Create a sketch of conf file/etc/ceph/ceph.conf:

Also we have to assign uniq ID for our cluster – fsid parameter. We can generate it using command uuidgen:

fsid = 4ab708b9-1f1c-416b-b5f2-d47000e4b2b7
public network =
cluster_network =
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
mon_osd.full_ratio = .90
mon_osd.nearfull_ratio = .80
osd journal size = 1024
osd pool default size = 3
osd pool default min size = 2
osd pool default pg num = 128
osd pool default pgp num = 128
mon initial members = ceph01,ceph02,ceph03

mon host = ceph01
mon addr =

journal_dio = true
journal_aio = true
journal_block_align = true

host = ceph01
public_addr =
cluster_addr =

Next create keyring, secret key and user admin (using this manual page):

Keyring creation for monitor servers:
ceph-authtool --create-keyring /tmp/ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'

User admin creation and key generating for the user:
ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring --gen-key \
-n client.admin --set-uid=0 --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow'

Import ceph.client.admin.keyring to ceph.mon.keyring:
ceph-authtool /tmp/ceph.mon.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring

Change ceph.conf and ceph.client.admin.keyringfile permissions – chmod 644.

Create map for monitor servers (monmap):

monmaptool --create --add {hostname} {ip-address} --fsid {uuid} /tmp/monmap

where {hostname} – short hostname (hostname -s), {ip-address} – IP address of the first monitor server and {uuid} – uniq cluster ID (fsid).

monmaptool --create --add ceph01 --fsid 4ab708b9-1f1c-416b-b5f2-d47000e4b2b7 /tmp/monmap

Create a directory for MON server data:

mkdir /var/lib/ceph/mon/{cluster-name}-{hostname}

wehre {cluster-name} – cluster name (I use “ceph”), {hostname} – short hostname (mentioned above).

mkdir /var/lib/ceph/mon/ceph-ceph01

Create filesystem in monitor server data directory:

ceph-mon [--cluster {cluster-name}] --mkfs -i {hostname} --monmap /tmp/monmap --keyring /tmp/ceph.mon.keyring

Example (cluster name is not necessary here):
ceph-mon -i ceph01 --mkfs --monmap /tmp/monmap --keyring /tmp/ceph.mon.keyring

Now, let’s start first MON server:
[root@ceph01 tmp]# service ceph start mon
=== mon.ceph01 ===
Starting Ceph mon.ceph01 on ceph01...
Running as unit run-25539.service.
Starting ceph-create-keys on ceph01...

Cpeh uses Paxos algorithm for solving consensus in cluster, it means that cluster nodes quantity has to be odd – 1, 3, 5 and so forth. In my case it will be 3 mon nodes in cluster.
To add two more MON nodes use the manpage.
***First we have to make changes in our configuration file – add two new MONs, then we have to populate new version of config on all MON servers.
mon host = ceph01,ceph02,ceph03
mon addr =,,

host = ceph01
mon addr =

host = ceph02
mon addr =

host = ceph03
mon addr =

For two new MONs we will do next:

1) mkdir /var/lib/ceph/mon/ceph-{hostname} – create MON server data directory
2) Copy keyring file from first (initial) MON server (/tmp/ceph.mon.keyring) and map – /tmp/monmap. Or we can get current map using command ceph mon getmap -o /path/to/output-file on first installed MON server and then copy it to new
3) Create filesystem: ceph-mon -i {hostname} --mkfs --monmap /tmp/monmap \
--keyring /tmp/ceph.mon.keyring

4) Start mon service: service ceph start mon
5) Add mon server in cluster (in example we add ceph02 MON server):
[root@ceph01 tmp]# ceph mon add ceph02
added mon.ceph02 at
***If you try to add a server twice, you will get an error: Error EEXIST: mon.ceph02 at already exists

So, cluster is created and we can check cluster status:
[root@ceph01 tmp]# ceph -s
cluster 928269ea-de0f-457d-9623-69acd5d8fd2c
too many PGs per OSD (490 > max 300)
monmap e3: 3 mons at {ceph01=,ceph02=,ceph03=}
election epoch 80, quorum 0,1,2 ceph01,ceph02,ceph03

On top


3. Adding and setting OSD.

Ceph OSD – object storage deamon. OSD controls write data process on a disk, replication, restoration and cluster data rebalancing. If we want to save 2 copies of data we need at least 2 OSDs (3 copies by default). Usually one OSD is one physical disk – it’s best practice.

Previously we have to do next:
1) fortmat disks for xfs filesystem (xfs is recommended, you can use any which support xattr);
2) create directory for each OSD: /var/lib/ceph/osd/ceph-, for example /var/lib/ceph/osd/ceph-0 (on each OSD server);
3) edit /etc/fstab – add information about OSDs.

UUID=b20d7a97-74c1-4a01-8e28-00c7f6af7126 /var/lib/ceph/osd/ceph-0 xfs defaults 0 1
UUID=45f8b67d-2f2c-49c9-9d85-d3f6d229a445 /var/lib/ceph/osd/ceph-1 xfs defaults 0 1
UUID=91f0ca43-527f-4ea5-bcd3-e71370c8a460 /var/lib/ceph/osd/ceph-2 xfs defaults 0 1

And again, we have to add OSD info in ceph.conf:

host = ceph01
public_addr =
cluster_addr =

We have to add [osd.x] sections for each OSD in our cluster. In my example it will be 9 OSDs – 3 per OSD server. It is a test environment that’s why MON and OSD service on the same machines. In production environment you better reside MON servers separetely – it’s best practice.

On each OSD server let’t generate array of ODS’s UUID – for each OSD:

for i in $(seq 0 2);
echo ${osduuids[i]};

Then create OSD on each OSD-server according to UUIDs we generated in previous step:

echo "OSD Creating...";
for i in $(seq 0 2);
ceph osd create ${osduuids[i]};
echo ${osduuids[i]};

OSD’s number assign automatically and it starts from 0 (zero).
***It is not necessary to specify UUID in “ceph osd create” command, it could be generate on the fly. But I faced a problem: if I didn’t generate UUIDs preliminarily, ODS hadn’t started.

Ok. Now initialize OSDs data dyrectory:

echo "OSD make FS...";
for i in $(seq 0 2);
echo "OSD_$osdnum refers to ${osduuids[${i}]}";
ceph-osd -i ${osdnum} --mkfs --mkkey --osd-uuid ${osduuids[${i}]};

Here osdnum variable is OSD’s index number we start from. And the example above is for first OSD-server. On the second we start from 3 – osdnum=3;. And so on.

Register auth key for OSD:

echo "OSD auth creating..."
for i in $(seq 0 2);
ceph auth add osd.${osdnum} osd 'allow *' mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-${osdnum}/keyring;

And again – osdnum. Remember that (see above).

CRUCH map creating.

It is really important to create a CRUSH map. This map will determine a structure of your cluster – how to destribute copies of data accross OSDs.
We can get CRUSH map sample:
ceph osd getcrushmap -o /tmp/crushmap
and then decompile in text file:
crushtool -d /tmp/crushmap -o /tmp/de-crushmap, now you can edit the file.
When you finish you have to compile it back:
crushtool -c {decompiled-crush-map-filename} -o {compiled-crush-map-filename},
and then load into cluster:
ceph osd setcrushmap -i {compiled-crushmap-filename}.
Also you can edit CRUSH map on the fly.
Add host in cluster: ceph osd crush add-bucket ceph01 host;
Add host in root section of map: ceph osd crush move ceph01 root=default;
Add OSD in ceph01 section of map: ceph osd crush add osd.0 0.02 host=ceph01.
The final CRUSH map you can watch here.
And the final cluster configuration file ceph.conf here.

OSD starting.

A command ceph osd tree shows OSD tree (surprising, I know):

[root@ceph01 ~]# ceph osd tree
-1 6.00000 root default
-2 2.00000 host ceph01
0 0.79999 osd.0 down 1.00000 1.00000
1 0.79999 osd.1 down 1.00000 1.00000

When we finished setting cluster and CRUSH map we can start OSDs:

[root@ceph01 ~]# service ceph start osd.0
=== osd.0 ===
create-or-move updated item name 'osd.0' weight 0.02 at location {host=ceph01,root=default} to crush map
Starting Ceph osd.0 on ceph01...
Running as unit run-21213.service.

It is possible to start all OSDs on a particular server:
service ceph start osd
It is necessary to consider that PG (Placement Groups) don’t switch in active+clean state if you try to start less OSDs than replication factor (data copies quantity). For example, our CRUSH map consider saving each copy of data on different servers (there are can’t copies on the same server), so it means you have to start at least 3 OSDs on different servers each.
Here you can see a final cluster state after all settings.

On top

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2015