Difference between revisions of "OQMD2PyChemiaDB"
(2 intermediate revisions by the same user not shown) | |||
Line 6: | Line 6: | ||
If the system is running RHEL/CentOS | If the system is running RHEL/CentOS | ||
The commands below assume that you have become root, otherwise use sudo before each command | The commands below assume that you have become root, otherwise use sudo before each command | ||
+ | We will install also mariadb-devel as it is needed to install one of the dependencies of qmpy | ||
<pre> | <pre> | ||
− | sudo yum install mariadb-server mariadb mariadb-test | + | sudo yum install mariadb-server mariadb mariadb-test mariadb-devel |
</pre> | </pre> | ||
Line 269: | Line 270: | ||
The OQMD database is ready and we can continue creating the PyChemiaDB database. | The OQMD database is ready and we can continue creating the PyChemiaDB database. | ||
+ | |||
+ | |||
+ | === Installing MongoDB === | ||
+ | |||
+ | The best way of keeping an updated version of MongoDB is using the repository from the developers | ||
+ | As root create the file | ||
+ | |||
+ | <pre> | ||
+ | emacs /etc/yum.repos.d/mongodb-org-3.4.repo | ||
+ | </pre> | ||
+ | |||
+ | And write the following inside: | ||
+ | |||
+ | <pre> | ||
+ | [mongodb-org-3.4] | ||
+ | name=MongoDB Repository | ||
+ | baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/3.4/x86_64/ | ||
+ | gpgcheck=1 | ||
+ | enabled=1 | ||
+ | gpgkey=https://www.mongodb.org/static/pgp/server-3.4.asc | ||
+ | </pre> | ||
+ | |||
+ | After that simply install mongo from the repository: | ||
+ | |||
+ | <pre> | ||
+ | sudo yum install -y mongodb-org | ||
+ | </pre> | ||
+ | |||
+ | Activate the service and enable the automatic start for the next reboot of the machine | ||
+ | |||
+ | <pre> | ||
+ | $ systemctl enable mongod | ||
+ | $ systemctl start mongod | ||
+ | </pre> | ||
+ | |||
+ | Test that the service is actually running | ||
+ | |||
+ | <pre> | ||
+ | $ systemctl status mongod | ||
+ | ● mongod.service - High-performance, schema-free document-oriented database | ||
+ | Loaded: loaded (/usr/lib/systemd/system/mongod.service; enabled; vendor preset: disabled) | ||
+ | Active: active (running) since Tue 2017-08-15 00:43:04 EDT; 5s ago | ||
+ | Docs: https://docs.mongodb.org/manual | ||
+ | Process: 11328 ExecStartPre=/usr/bin/chmod 0755 /var/run/mongodb (code=exited, status=0/SUCCESS) | ||
+ | Process: 11325 ExecStartPre=/usr/bin/chown mongod:mongod /var/run/mongodb (code=exited, status=0/SUCCESS) | ||
+ | Process: 11323 ExecStartPre=/usr/bin/mkdir -p /var/run/mongodb (code=exited, status=0/SUCCESS) | ||
+ | Main PID: 11334 (mongod) | ||
+ | CGroup: /system.slice/mongod.service | ||
+ | └─11334 /usr/bin/mongod -f /etc/mongod.conf | ||
+ | |||
+ | Aug 15 00:43:04 mdg16.wvu.edu systemd[1]: Starting High-performance, schema-free document-oriented database... | ||
+ | Aug 15 00:43:04 mdg16.wvu.edu systemd[1]: Started High-performance, schema-free document-oriented database. | ||
+ | Aug 15 00:43:04 mdg16.wvu.edu mongod[11331]: about to fork child process, waiting until server is ready for connections. | ||
+ | Aug 15 00:43:04 mdg16.wvu.edu mongod[11331]: forked process: 11334 | ||
+ | Aug 15 00:43:05 mdg16.wvu.edu mongod[11331]: child process started successfully, parent exiting | ||
+ | </pre> | ||
+ | |||
+ | Use the command line interface to actually enter in mongo and check the databases created | ||
+ | |||
+ | <pre> | ||
+ | $ mongo | ||
+ | MongoDB shell version v3.4.7 | ||
+ | connecting to: mongodb://127.0.0.1:27017 | ||
+ | MongoDB server version: 3.4.7 | ||
+ | Server has startup warnings: | ||
+ | 2017-08-15T00:43:04.958-0400 I CONTROL [initandlisten] | ||
+ | 2017-08-15T00:43:04.958-0400 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database. | ||
+ | 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted. | ||
+ | 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] | ||
+ | 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] | ||
+ | 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] ** WARNING: You are running on a NUMA machine. | ||
+ | 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] ** We suggest launching mongod like this to avoid performance problems: | ||
+ | 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] ** numactl --interleave=all mongod [other options] | ||
+ | 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] | ||
+ | 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'. | ||
+ | 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] ** We suggest setting it to 'never' | ||
+ | 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] | ||
+ | 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'. | ||
+ | 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] ** We suggest setting it to 'never' | ||
+ | 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] | ||
+ | > show dbs | ||
+ | admin 0.000GB | ||
+ | local 0.000GB | ||
+ | > exit | ||
+ | bye | ||
+ | </pre> | ||
+ | |||
+ | For the time being we can just ignore the performance warning above, we will concentrate in creating the new PyChemiaDB database. | ||
+ | |||
+ | === Installing qmpy === | ||
+ | |||
+ | The python package qmpy serves as interface between the OQMD database and python. Pychemia uses qmpy to search for the best candidate for each entry on the database. | ||
+ | We will download qmpy directly from GitHub. If the command git is not present install it with (as root or with sudo) | ||
+ | |||
+ | <pre> | ||
+ | yum install git | ||
+ | </pre> | ||
+ | |||
+ | Now, download qmpy from the official repository | ||
+ | |||
+ | <pre> | ||
+ | git clone https://github.com/wolverton-research-group/qmpy.git | ||
+ | </pre> | ||
+ | |||
+ | qmpy has a number o prerequisites many of them are also needed by pychemia. | ||
+ | Install all the prerequisites with pip | ||
+ | |||
+ | If you do not have pip installed, install it with yum (CentOS/RHEL) we need also the development packages for compiling some of the packages that are not entirely python code. | ||
+ | For RHEL 7.4 use the command: | ||
+ | |||
+ | <pre> | ||
+ | sudo yum install python2-pip python34-pip python-devel python34-devel | ||
+ | </pre> | ||
+ | |||
+ | Most scientific software uses quite recent versions of many packages, it is in general not a very good idea rely on the packages provided by the official repositories of the Linux Distribution. | ||
+ | Using pip you can install packages that are far more recent. That is the path that we will follow to satisfy all the dependencies. | ||
+ | The next commands assumes that you are now using a personal account, | ||
+ | |||
+ | The first step is to use pip to upgrade pip to the latest version | ||
+ | |||
+ | <pre> | ||
+ | sudo yum install --upgrade pip --user | ||
+ | </pre> | ||
+ | |||
+ | Using the option --user will install the package on your home folder, usually ~/local | ||
+ | You can add that ~/local/bin to your path in order to give preference to the programs that you install there. | ||
+ | One of the dependencies, spglib requieres a more recent version of setuptools we need to update that package too | ||
+ | |||
+ | <pre> | ||
+ | ~/local/bin/pip install --upgrade setuptools --user | ||
+ | </pre> | ||
+ | |||
+ | Now we can proceed to install the most recent versions python packages need by qmpy and PyChemia. | ||
+ | |||
+ | <pre> | ||
+ | ~/local/bin/pip install --upgrade django pulp numpy scipy matplotlib networkx pytest python-memcached \ | ||
+ | ase django-extensions lxml pyparsing spglib pycifrw pyyaml scikit-learn pymongo future nose --user | ||
+ | </pre> | ||
+ | |||
+ | The final dependency that still remains is mysql-python, this python package needs mariadb-devel, | ||
+ | if you did not install it before this is the moment to do it | ||
+ | |||
+ | <pre> | ||
+ | yum install mariadb-devel | ||
+ | </pre> | ||
+ | |||
+ | You can test that you have a command called mysql_config available on your system. | ||
+ | |||
+ | <pre> | ||
+ | $ mysql_config --version | ||
+ | 5.5.52 | ||
+ | </pre> |
Latest revision as of 01:41, 15 August 2017
Recreating the OQMD database and PyChemiaDB
Install MariaDB
If the system is running RHEL/CentOS The commands below assume that you have become root, otherwise use sudo before each command We will install also mariadb-devel as it is needed to install one of the dependencies of qmpy
sudo yum install mariadb-server mariadb mariadb-test mariadb-devel
Activate MariaDB service
To make the service available inmediately as well as in any future restart of the machine
systemctl start mariadb.service systemctl enable mariadb.service
Check the service is actually running
systemctl status mariadb.service
The answer should be something like:
● mariadb.service - MariaDB database server Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2017-08-14 20:46:09 EDT; 31s ago Main PID: 8439 (mysqld_safe) CGroup: /system.slice/mariadb.service ├─8439 /bin/sh /usr/bin/mysqld_safe --basedir=/usr └─8596 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-file=/var/run/mariadb/mariadb.pid --socket=/va... Aug 14 20:46:07 mdg16.wvu.edu mariadb-prepare-db-dir[8357]: The latest information about MariaDB is available at http://mariadb.org/. Aug 14 20:46:07 mdg16.wvu.edu mariadb-prepare-db-dir[8357]: You can find additional information about the MySQL part at: Aug 14 20:46:07 mdg16.wvu.edu mariadb-prepare-db-dir[8357]: http://dev.mysql.com Aug 14 20:46:07 mdg16.wvu.edu mariadb-prepare-db-dir[8357]: Support MariaDB development by buying support/new features from MariaDB Aug 14 20:46:07 mdg16.wvu.edu mariadb-prepare-db-dir[8357]: Corporation Ab. You can contact us about this at sales@mariadb.com. Aug 14 20:46:07 mdg16.wvu.edu mariadb-prepare-db-dir[8357]: Alternatively consider joining our community based development effort: Aug 14 20:46:07 mdg16.wvu.edu mariadb-prepare-db-dir[8357]: http://mariadb.com/kb/en/contributing-to-the-mariadb-project/ Aug 14 20:46:07 mdg16.wvu.edu mysqld_safe[8439]: 170814 20:46:07 mysqld_safe Logging to '/var/log/mariadb/mariadb.log'. Aug 14 20:46:07 mdg16.wvu.edu mysqld_safe[8439]: 170814 20:46:07 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql Aug 14 20:46:09 mdg16.wvu.edu systemd[1]: Started MariaDB database server.
You can also confirm that you can get initial access from the command line
$ mysql Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 2 Server version: 5.5.52-MariaDB MariaDB Server Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | | test | +--------------------+ 4 rows in set (0.00 sec) MariaDB [(none)]> quit Bye
Secure the installation
There is a basic securing script that will associate a password to the root account and removing the test database entirely.
Just press enter when asked for the root password inside MariaDB. That root account have nothing to do with the root account of the system, the passwords does not need to match.
$ mysql_secure_installation NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MariaDB SERVERS IN PRODUCTION USE! PLEASE READ EACH STEP CAREFULLY! In order to log into MariaDB to secure it, we'll need the current password for the root user. If you've just installed MariaDB, and you haven't set the root password yet, the password will be blank, so you should just press enter here. Enter current password for root (enter for none): OK, successfully used password, moving on... Setting the root password ensures that nobody can log into the MariaDB root user without the proper authorisation. Set root password? [Y/n] New password: Re-enter new password: Password updated successfully! Reloading privilege tables.. ... Success! By default, a MariaDB installation has an anonymous user, allowing anyone to log into MariaDB without having to have a user account created for them. This is intended only for testing, and to make the installation go a bit smoother. You should remove them before moving into a production environment. Remove anonymous users? [Y/n] ... Success! Normally, root should only be allowed to connect from 'localhost'. This ensures that someone cannot guess at the root password from the network. Disallow root login remotely? [Y/n] ... Success! By default, MariaDB comes with a database named 'test' that anyone can access. This is also intended only for testing, and should be removed before moving into a production environment. Remove test database and access to it? [Y/n] - Dropping test database... ... Success! - Removing privileges on test database... ... Success! Reloading the privilege tables will ensure that all changes made so far will take effect immediately. Reload privilege tables now? [Y/n] ... Success! Cleaning up... All done! If you've completed all of the above steps, your MariaDB installation should now be secure. Thanks for using MariaDB!
Creating an non root user with privileges on the database
Assuming that there is a user called 'mdg' we can give that user priviledges to read and write all databases running on the system. You can also filter to be just one database and limit to just read.
$ mysql -u root -p Enter password: Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 12 Server version: 5.5.52-MariaDB MariaDB Server Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> CREATE USER 'mdg'@'localhost'; Query OK, 0 rows affected (0.00 sec) MariaDB [(none)]> GRANT ALL PRIVILEGES ON * . * TO 'mdg'@'localhost'; FLUSH PRIVILEGES; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) MariaDB [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | +--------------------+ 3 rows in set (0.00 sec) MariaDB [(none)]> quit Bye
Creating a New database
Lets create a new database called oqmd that will be using to recreate from the dump
$ mysql -u root -p Enter password: Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 16 Server version: 5.5.52-MariaDB MariaDB Server Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> create database oqmd; Query OK, 1 row affected (0.00 sec) MariaDB [(none)]> use oqmd; Database changed MariaDB [oqmd]> quit Bye
Recreating the OWMD database from the dump
The webpage for the OQMD database is http://oqmd.org The most recent database can be download from
By the time this tutorial was written the line to download the most recent database is
wget http://oqmd.org/static/downloads/qmdb__v1_1__102016.sql.gz
Once you download the compress dump, uncompress it with
gunzip qmdb__v1_1__102016.sql.gz
Once the file is uncompress recreate the new database with:
mysql oqmd -u root -p < qmdb__v1_1__102016.sql
Once the database is created you can test the number of entries
$ mysql -u root -p Enter password: Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 21 Server version: 5.5.52-MariaDB MariaDB Server Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> use oqmd; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed MariaDB [oqmd]> SELECT COUNT(*) FROM entries; +----------+ | COUNT(*) | +----------+ | 471857 | +----------+ 1 row in set (0.07 sec) MariaDB [oqmd]> quit Bye
The OQMD database is ready and we can continue creating the PyChemiaDB database.
Installing MongoDB
The best way of keeping an updated version of MongoDB is using the repository from the developers As root create the file
emacs /etc/yum.repos.d/mongodb-org-3.4.repo
And write the following inside:
[mongodb-org-3.4] name=MongoDB Repository baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/3.4/x86_64/ gpgcheck=1 enabled=1 gpgkey=https://www.mongodb.org/static/pgp/server-3.4.asc
After that simply install mongo from the repository:
sudo yum install -y mongodb-org
Activate the service and enable the automatic start for the next reboot of the machine
$ systemctl enable mongod $ systemctl start mongod
Test that the service is actually running
$ systemctl status mongod ● mongod.service - High-performance, schema-free document-oriented database Loaded: loaded (/usr/lib/systemd/system/mongod.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2017-08-15 00:43:04 EDT; 5s ago Docs: https://docs.mongodb.org/manual Process: 11328 ExecStartPre=/usr/bin/chmod 0755 /var/run/mongodb (code=exited, status=0/SUCCESS) Process: 11325 ExecStartPre=/usr/bin/chown mongod:mongod /var/run/mongodb (code=exited, status=0/SUCCESS) Process: 11323 ExecStartPre=/usr/bin/mkdir -p /var/run/mongodb (code=exited, status=0/SUCCESS) Main PID: 11334 (mongod) CGroup: /system.slice/mongod.service └─11334 /usr/bin/mongod -f /etc/mongod.conf Aug 15 00:43:04 mdg16.wvu.edu systemd[1]: Starting High-performance, schema-free document-oriented database... Aug 15 00:43:04 mdg16.wvu.edu systemd[1]: Started High-performance, schema-free document-oriented database. Aug 15 00:43:04 mdg16.wvu.edu mongod[11331]: about to fork child process, waiting until server is ready for connections. Aug 15 00:43:04 mdg16.wvu.edu mongod[11331]: forked process: 11334 Aug 15 00:43:05 mdg16.wvu.edu mongod[11331]: child process started successfully, parent exiting
Use the command line interface to actually enter in mongo and check the databases created
$ mongo MongoDB shell version v3.4.7 connecting to: mongodb://127.0.0.1:27017 MongoDB server version: 3.4.7 Server has startup warnings: 2017-08-15T00:43:04.958-0400 I CONTROL [initandlisten] 2017-08-15T00:43:04.958-0400 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database. 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted. 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] ** WARNING: You are running on a NUMA machine. 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] ** We suggest launching mongod like this to avoid performance problems: 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] ** numactl --interleave=all mongod [other options] 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'. 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] ** We suggest setting it to 'never' 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'. 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] ** We suggest setting it to 'never' 2017-08-15T00:43:04.959-0400 I CONTROL [initandlisten] > show dbs admin 0.000GB local 0.000GB > exit bye
For the time being we can just ignore the performance warning above, we will concentrate in creating the new PyChemiaDB database.
Installing qmpy
The python package qmpy serves as interface between the OQMD database and python. Pychemia uses qmpy to search for the best candidate for each entry on the database. We will download qmpy directly from GitHub. If the command git is not present install it with (as root or with sudo)
yum install git
Now, download qmpy from the official repository
git clone https://github.com/wolverton-research-group/qmpy.git
qmpy has a number o prerequisites many of them are also needed by pychemia. Install all the prerequisites with pip
If you do not have pip installed, install it with yum (CentOS/RHEL) we need also the development packages for compiling some of the packages that are not entirely python code. For RHEL 7.4 use the command:
sudo yum install python2-pip python34-pip python-devel python34-devel
Most scientific software uses quite recent versions of many packages, it is in general not a very good idea rely on the packages provided by the official repositories of the Linux Distribution. Using pip you can install packages that are far more recent. That is the path that we will follow to satisfy all the dependencies. The next commands assumes that you are now using a personal account,
The first step is to use pip to upgrade pip to the latest version
sudo yum install --upgrade pip --user
Using the option --user will install the package on your home folder, usually ~/local You can add that ~/local/bin to your path in order to give preference to the programs that you install there. One of the dependencies, spglib requieres a more recent version of setuptools we need to update that package too
~/local/bin/pip install --upgrade setuptools --user
Now we can proceed to install the most recent versions python packages need by qmpy and PyChemia.
~/local/bin/pip install --upgrade django pulp numpy scipy matplotlib networkx pytest python-memcached \ ase django-extensions lxml pyparsing spglib pycifrw pyyaml scikit-learn pymongo future nose --user
The final dependency that still remains is mysql-python, this python package needs mariadb-devel, if you did not install it before this is the moment to do it
yum install mariadb-devel
You can test that you have a command called mysql_config available on your system.
$ mysql_config --version 5.5.52