GBase database (GBase数据库) provides robust capabilities for modern data infrastructure. This guide provides the step-by-step installation of GCDW (GBase Cloud Data Warehouse) and configuring HDFS-related settings, ensuring seamless integration and optimal performance. By following this guide, you will set up a secure and efficient data warehouse environment leveraging both GBase and Hadoop technologies.
Table of Contents
- Prerequisites
- Resource Limit Configuration
- Hosts File Configuration
- Configuring Trust for gbase User
- Installing GBase Cloud Data Warehouse (GCDW)
- Configuring HDFS with Kerberos Authentication
- Updating GCluster and GNode Configuration
- Enabling Auto-Startup on Boot
- Creating a Warehouse
- Testing the Setup
- Conclusion
Prerequisites
- Operating System: CentOS 7 or later, Red Hat Enterprise Linux 7 or later
- Hardware Requirements:
- CPU: Quad-core processor or higher
- Memory: At least 16 GB RAM
- Storage: Minimum of 500 GB free disk space
- Network Configuration:
- Static IP addresses for all nodes
- Proper hostname resolution
- User Permissions: Root or sudo access
- Software Packages:
- Java JDK 1.8 or higher
- SSH installed and configured for password-less login between nodes
- Kerberos: KDC server installed and configured
- Hadoop Cluster: Installed and configured with Kerberos authentication
Resource Limit Configuration
Step 1: Edit /etc/security/limits.conf
Add the following configurations to set resource limits for the gbase
user:
gbase soft nofile 65536
gbase hard nofile 65536
gbase soft nproc unlimited
gbase hard nproc unlimited
Step 2: Distribute the limits.conf
File Across All Nodes
Use a tool like scp
or a cluster management script to distribute the file:
scp /etc/security/limits.conf root@<node_ip>:/etc/security/limits.conf
Repeat this for all nodes in the cluster.
Hosts File Configuration
Step 1: Edit /etc/hosts
Add the IP addresses and hostnames for all nodes, including:
- Primary and secondary NameNodes in the Hadoop cluster
- All DataNodes
- KDC domain and its corresponding hostname
Example:
192.168.1.10 namenode1.hadoop.com namenode1
192.168.1.11 namenode2.hadoop.com namenode2
192.168.1.20 datanode1.hadoop.com datanode1
192.168.1.21 datanode2.hadoop.com datanode2
192.168.1.30 kdc.hadoop.com kdc
Step 2: Distribute the /etc/hosts
File Across All Nodes
Again, use scp
or a cluster management script:
scp /etc/hosts root@<node_ip>:/etc/hosts
Repeat for all nodes.
Configuring Trust for gbase User
Set up SSH password-less authentication for the gbase
user across all nodes.
Step 1: Generate SSH Keys for gbase
User
On the primary node:
su - gbase
ssh-keygen -t rsa -b 2048
Press Enter to accept the default file location and leave the passphrase empty.
Step 2: Distribute the Public Key
Copy the public key to all nodes:
for node in node1 node2 node3; do
ssh-copy-id -i ~/.ssh/id_rsa.pub gbase@$node
done
Installing GBase Cloud Data Warehouse (GCDW)
Step 1: Extract the Installation Package
Assuming you have the installation package at /opt/tools/gcdw-NoLicense-9.8.0.7.6-redhat8-x86_64.tar.bz2
:
sudo mkdir -p /opt/gcdw
sudo tar -xf /opt/tools/gcdw-NoLicense-9.8.0.7.6-redhat8-x86_64.tar.bz2 -C /opt/gcdw/
Step 2: Set Permissions for the GCDW Directory
sudo chown -R gbase:gbase /opt/gcdw/
Step 3: Distribute and Execute SetSysEnv.py
Distribute the Script
Copy the SetSysEnv.py
script to all nodes:
scp /opt/gcdw/gcinstall/SetSysEnv.py gbase@<node_ip>:/opt/gcdw/SetSysEnv.py
Execute the Script on All Nodes
Log in to each node and run:
su - gbase
python /opt/gcdw/SetSysEnv.py --dbaUser=gbase --installPrefix=/opt/gcdw --cgroup
Configuring HDFS with Kerberos Authentication
Step 1: Configure demo.options
for Kerberos Authentication
Create a configuration file named demo.options
in /opt/gcdw/gcinstall/
with the following content:
gcluster_instance_name=instance_name
instance_root_name=root
instance_root_password=root_password
gcdw_STORAGE_STYLE=hdfs
gcdw_HDFS_URI=hdfs://namenode.hadoop.com:8020/
gcdw_HDFS_AUTH_MODE=kerberos
gcdw_HDFS_PRINCIPAL=hdfs/[email protected]
gcdw_HDFS_KEYTAB=/etc/hdfs.keytab
gcdw_HDFS_KERBEROS_CONFIG=/etc/krb5.conf
- Replace
instance_name
with your desired instance name. - Update
namenode.hadoop.com
with your NameNode’s hostname. - Ensure the
hdfs.keytab
file andkrb5.conf
are correctly placed and accessible.
Step 2: Execute the Deployment Script
Navigate to the installation directory:
cd /opt/gcdw/gcinstall
Run the installation script:
./gcinstall.py --silent=demo.options
Follow any on-screen prompts to complete the installation.
Step 3: Validate Installation
After installation, validate with the following command:
gcadmin account --show
Ensure that the account information displays correctly.
Updating GCluster and GNode Configuration
Step 1: Update GCluster Configuration
On all nodes, edit /opt/<node_ip>/gcluster/config/gbase_8a_gcluster.cnf
:
[gbased]
gcdw_hdfs_client_timeout=600
_t_gcluster_support_cte=1
table_definition_cache=5120
table_open_cache=1280
gcluster_random_insert=1
gcluster_send_client_data_timeout=1800
group_concat_max_len=10240
gbase_hdfs_auth_mode=kerberos
gbase_hdfs_protocol=rpc
gbase_hdfs_keytab=/etc/hdfs.keytab
gbase_hdfs_principal=hdfs/[email protected]
gcdw_hdfs_namenodes=192.168.1.10,192.168.1.11|namenode1.hadoop.com,namenode2.hadoop.com
- Replace the IP addresses and hostnames with your actual NameNode IPs and hostnames.
- Ensure the keytab and principal are correctly specified.
Step 2: Restart GCluster Service
su - gbase
gcluster_services all restart
Step 3: Update GNode Configuration
Edit /opt/<node_ip>/gnode/config/gbase_8a_gbase.cnf
:
[gbased]
gbase_loader_parallel_degree=4
gbase_parallel_degree=4
gbase_parallel_max_thread_in_pool=512
gbase_loader_read_timeout=5000
gbase_loader_max_line_length=32M
thread_stack=524288
gbase_hdfs_auth_mode=kerberos
gbase_hdfs_protocol=rpc
gbase_hdfs_keytab=/etc/hdfs.keytab
gbase_hdfs_principal=hdfs/[email protected]
gcdw_hdfs_namenodes=192.168.1.10,192.168.1.11|namenode1.hadoop.com,namenode2.hadoop.com
Step 4: Restart GBase Service
su - gbase
gbase_services all restart
Enabling Auto-Startup on Boot
Enable GCDW services to start automatically on system boot.
Step 1: Edit /etc/rc.d/rc.local
Add the following lines:
su - gbase -c "gcluster_services all start"
su - gbase -c "gcware_services all start"
Step 2: Make the Script Executable
sudo chmod +x /etc/rc.d/rc.local
Creating a Warehouse
Step 1: Use gcadmin
to Create a Warehouse Template
su - gbase
gcadmin createwh e wh.xml
Step 2: Edit wh.xml
Modify the wh.xml
file to include:
- Node IPs in the
<NodeList>
section - Warehouse name and comment
Example:
<Warehouse>
<Name>my_warehouse</Name>
<Comment>Production Warehouse</Comment>
<NodeList>
<Node>192.168.1.20</Node>
<Node>192.168.1.21</Node>
</NodeList>
</Warehouse>
Step 3: Create the Warehouse
gcadmin createwh wh.xml
Verify that the warehouse is created successfully.
Testing the Setup
Step 1: Connect to GBase
/home/gbase/GCDW/bin/gcdw_client -h localhost -p 5432 -U gbase -W
Step 2: Create an External Table Pointing to HDFS
CREATE EXTERNAL TABLE hdfs_data (
id INT,
name VARCHAR(100)
)
LOCATION ('hdfs://namenode.hadoop.com:8020/user/gbase/data.csv')
FORMAT 'CSV' (DELIMITER ',');
Step 3: Load Data into HDFS
Put your data file into HDFS:
hdfs dfs -mkdir -p /user/gbase
hdfs dfs -put data.csv /user/gbase/
Step 4: Query the External Table
SELECT * FROM hdfs_data LIMIT 10;
Verify that the data is correctly retrieved from HDFS.
Step 5: Perform Data Analytics
Run aggregate functions or joins to test performance:
SELECT COUNT(*) FROM hdfs_data;
Step 6: Monitor Logs and Performance
- Check GBase logs located in
/var/log/gcdw
for any errors. - Use Hadoop’s web interfaces to monitor HDFS and resource usage.
Conclusion
By this, you have successfully installed GBase Cloud Data Warehouse and configured it with Hadoop Distributed File System using Kerberos authentication. This setup provides a secure and scalable data management solution, ready for enterprise-level demands.