Streaming Oracle Database Logs to HBase with Flume

In the previous tutorial we discussed streaming Oracle logs to HDFS using Flume. Flume supports various types of sources and sinks including the HBase database as a sink. In this tutorial we shall discuss streaming Oracle log file to HBase. This tutorial has the following sections.

Setting the Environment

We have used the same environment as in the streaming to HDFS. Oracle Database 11g is installed on Oracle Linux 6.5 on VirtualBox 4.3. We need to download and install the following software.

Oracle Database 11g
HBase
Java 7
Flume 1.4
Hadoop 2.0.0

First, create a directory to install the software and set its permissions.

mkdir /flume

chmod -R 777 /flume

cd /flume

Create the hadoop group and add the hbase user to the hadoop group.

>groupadd hadoop

>useradd –g hadoop hbase

Download and install Java 7.

>tar zxvf jdk-7u55-linux-i586.tar.gz

Download and install CDH 4.6 Hadoop 2.0.0.

>wget http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.6.0.tar.gz

>tar -xvf hadoop-2.0.0-cdh4.6.0.tar.gz

Create symlinks for Hadoop bin and conf files.

>ln -s /flume/hadoop-2.0.0-cdh4.6.0/bin-mapreduce1 /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce1/bin

>ln -s /flume/hadoop-2.0.0-cdh4.6.0/etc/hadoop /flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce1/conf

Download and install CDH 4.6 Flume 1.4.9.

wget http://archive-primary.cloudera.com/cdh4/cdh/4/flume-ng-1.4.0-cdh4.6.0.tar.gz

tar -xvf flume-ng-1.4.0-cdh4.6.0.tar.gz

Download and install CDH 4.6 HBase 0.94.15.

wget http://archive.cloudera.com/cdh4/cdh/4/hbase-0.94.15-cdh4.6.0.tar.gz

tar -xvf hbase-0.94.15-cdh4.6.0.tar.gz

Set permissions of the Flume root directory to global.

chmod 777 -R /flume/apache-flume-1.4.0-cdh4.6.0-bin

Set the environment variables for Oracle Database, Java, HBase, Flume, and Hadoop in the bash shell file.

vi ~/.bashrc

export HADOOP_PREFIX=/flume/hadoop-2.0.0-cdh4.6.0

export HADOOP_CONF=$HADOOP_PREFIX/etc/hadoop

export FLUME_HOME=/flume/apache-flume-1.4.0-cdh4.6.0-bin

export FLUME_CONF=/flume/apache-flume-1.4.0-cdh4.6.0-bin/conf

export HBASE_HOME=/flume/hbase-0.94.15-cdh4.6.0

export HBASE_CONF=/flume/hbase-0.94.15-cdh4.6.0/conf

export JAVA_HOME=/flume/jdk1.7.0_55

export ORACLE_HOME=/home/oracle/app/oracle/product/11.2.0/dbhome_1

export ORACLE_SID=ORCL

export HADOOP_MAPRED_HOME=/flume/hadoop-2.0.0-cdh4.6.0

export HADOOP_HOME=/flume/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce1

export HADOOP_CLASSPATH=$HADOOP_HOME/*:$HADOOP_HOME/lib/*:$HBASE_CONF:$HBASE_HOME/lib/*

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_MAPRED_HOME/bin:$ORACLE_HOME/bin:$FLUME_HOME/bin:$HBASE_HOME/bin

export CLASSPATH=$HADOOP_CLASSPATH

Starting HDFS

In this section we shall configure and start HDFS. Cd to the Hadoop configuration directory.

cd /flume/hadoop-2.0.0-cdh4.6.0/etc/hadoop

Set the NameNode URI (fs.defaultFS) and the Hadoop temporary directory (hadoop.tmp.dir) configuration properties in the core-site.xml file.

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>fs.defaultFS</name>

</property>

<name>hadoop.tmp.dir</name>

<value>file:///var/lib/hadoop-0.20/cache</value>

</property>

</configuration>

Remove any previously created temporary directory and create the directory again and set its permissions to global.

rm -rf /var/lib/hadoop-0.20/cache

mkdir -p /var/lib/hadoop-0.20/cache

chmod -R 777 /var/lib/hadoop-0.20/cache

Set the NameNode storage directory (dfs.namenode.name.dir), superusergroup (dfs.permissions.superusergroup), replication factor (dfs.replication), the upper bound on the number of files the DataNode is able to serve concurrently (dfs.datanode.max.xcievers), and permission checking (dfs.permissions) configuration properties in the hdfs-site.xml.

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>dfs.permissions.superusergroup</name>

<value>hadoop</value>

</property><property>

<name>dfs.namenode.name.dir</name>

</property>

<name>dfs.replication</name>

</property>

<name>dfs.permissions</name>

<value>false</value>

</property>

<name>dfs.datanode.max.xcievers</name>

</property>

</configuration>

Remove any previously created NameNode storage directory and create a new directory and set its permissions to global.

rm -rf /data/1/dfs/nn

mkdir -p /data/1/dfs/nn

chmod -R 777 /data/1/dfs/nn

Format and start the NameNode.

hadoop namenode -format

hadoop namenode

Start the DataNode.

hadoop datanode

We need to copy the Flume lib directory jars to the HDFS to be available to the runtime. Create a directory in HDFS with the same directory structure as the Flume lib directory and set its permissions to global.

hadoop dfs -mkdir /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib

hadoop dfs -chmod -R 777 /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib

Put the Flume lib directory jars to the HDFS.

hdfs dfs -put /flume/apache-flume-1.4.0-cdh4.6.0-bin/lib/* hdfs://10.0.2.15:8020/flume/apache-flume-1.4.0-cdh4.6.0-bin/lib

Create the Flume configuration file flume.conf from the template. Also create the Flume env file flume-env.sh from the template.

cp $FLUME_HOME/conf/ flume-conf.properties.template $FLUME_HOME/conf/flume.conf

cp $FLUME_HOME/conf/flume-env.sh.template $FLUME_HOME/conf/flume-env.sh

We shall set the configuration properties for Flume in a subsequent section, but first we shall install HBase.

Starting HBase

In this section we shall configure and start HBase. HBase configuration is discussed in detail in another tutorial (http://www.toadworld.com/platforms/oracle/w/wiki/10976.loading-hbase-table-data-into-an-oracle-database-with-oracle-loader-for-hadoop.aspx). Set the HBase configuration in the /flume/hbase-0.94.15-cdh4.6.0/conf/hbase-site.xml configuration file as follows.

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>hbase.rootdir</name>

<value>hdfs://10.0.2.15:8020/hbase</value>

</property>

<name>hbase.zookeeper.property.dataDir</name>

<value>/zookeeper</value>

</property>

<name>hbase.zookeeper.property.clientPort</name>

</property>

<name>hbase.zookeeper.quorum</name>

<value>localhost</value>

</property>

<name>hbase.regionserver.port</name>

</property>

<name>hbase.master.port</name>

</property>

</configuration>

Create the Zookeeper data directory and set its permissions.

mkdir -p /zookeeper

chmod -R 700 /zookeeper

As root user create the HBase root directory in HDFS /hbase and set its permissions to global (777).

root>hdfs dfs -mkdir /hbase

hdfs dfs -chmod -R 777 /hbase

As root user increase the maximum number of file handles in the /etc/security/limits.conf file. Set the following ulimit for hdfs and hbase users.

hdfs - nofile 32768

hbase - nofile 32768

Start the HBase nodes Zookeeper, Master and Regionserver.

hbase-daemon.sh start zookeeper

hbase-daemon.sh start master

hbase-daemon.sh start regionserver

The jps command should list the HDFS and HBase nodes as started.

Start the HBase shell with the following command.

hbase shell

Create a table (flume) and a column family (orcllog) with the following command.

create 'flume' , 'orcllog'

The HBase table gets created.

Configuring Flume Agent for HBase

In this section we shall set the Flume agent configuration in the flume.conf file. We shall configure the following properties in flume.conf for a Flume agent called hbase-agent.

Configuration Property	Description	Value
hbase-agent.channels	The Flume agent channels. We shall be using only channel called ch1 (the channel name is arbitrary).	hbase-agent.channels=ch1
hbase-agent.sources	The Flume agent sources. We shall be using one source of type exec called tail (the source name is arbitrary).	hbase-agent.sources=tail
hbase-agent.sinks	The Flume agent sinks. We shall be using one sink of type HBaseSink called sink1 (the sink name is arbitrary).	hbase-agent.sinks=sink1
hbase-agent.channels.ch1.type	The channel type is memory.	hbase-agent.channels.ch1.type=memory
hbase-agent.sources.tail.channels	Define the flow by binding the source to the channel.	hbase-agent.sources.tail.channels=ch1
hbase-agent.sources.tail.type	Specify the source type as exec.	hbase-agent.sources.tail.type=exec
hbase-agent.sources.tail.command	Runs the specified Unix command and produce data on stdout. Commonly used commands are the HDFS shell commands cat and tail for copying a complete log file or the last KB of a log file to stdout. We shall be demonstrating both of these commands.	hbase-agent.sources.tail.command = tail -F /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log or hbase-agent.sources.tail.command = cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log
hbase-agent.sinks.sink1.channel	Define the flow by binding the sink to the channel.	hbase-agent.sinks.sink1.channel=ch1
hbase-agent.sinks.sink1.type	Specify the sink type as HbaseSink or AsyncHbaseSink	hbase-agent.sinks.sink1.type=org.apache.flume.sink.hbase. HbaseSink
hbase-agent.sinks.sink1.table	Specify the HBase table name.	hbase-agent.sinks.sink1.table=flume
hbase-agent.sinks.sink1.columnFamily	Specify the HBase table column family	hbase-agent.sinks.sink1.columnFamily =orcllog
hbase-agent.sinks.sink1.column	Specify the HBase table column family column. ??	hbase-agent.sinks.sink1.column=c1
hbase-agent.sinks.sink1.serializer	Specify the HBase event serializer class. The serializer converts a Flume event into one or more puts and or increments.	hbase-agent.sinks.sink1.serializer= org.apache.flume.sink.hbase. SimpleHbaseEventSerializer
hbase-agent.sinks.sink1.serializer. payloadColumn	A parameter to the serializer. Specifies the payload column, the column into which the payload data is stored.	hbase-agent.sinks.sink1.serializer. payloadColumn =coll
hbase-agent.sinks.sink1.serializer. keyType	A parameter to the serializer. Specifies the key type.	hbase-agent.sinks.sink1.serializer. keyType = timestamp
hbase-agent.sinks.sink1.serializer. incrementColumn	A parameter to the serializer. Specifies the column to be incremented. The SimpleHbaseEventSerializer may optionally be set to increment a column in HBase.	hbase-agent.sinks.sink1.serializer. incrementColumn=coll
hbase-agent.sinks.sink1.serializer. rowPrefix	A parameter to the serializer. Specifies the row prefix to be used.	hbase-agent.sinks.sink1. serializer.rowPrefix=1
hbase-agent.sinks.sink1.serializer.suffix	A parameter to the serializer. One of the following values may be set: uuid random timestamp	hbase-agent.sinks.sink1. serializer.suffix=timestamp

The flume.conf file is listed:

hbase-agent.sources=tail

hbase-agent.sinks=sink1

hbase-agent.channels=ch1

hbase-agent.sources.tail.type=exec

hbase-agent.sources.tail.command=tail -F /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace

/alert_ORCL.log

hbase-agent.sources.tail.channels=ch1

hbase-agent.sinks.sink1.type=org.apache.flume.sink.hbase.HBaseSink

hbase-agent.sinks.sink1.channel=ch1

hbase-agent.sinks.sink1.table=flume

hbase-agent.sinks.sink1.columnFamily=orcllog

hbase-agent.sinks.sink1.column=c1

hbase-agent.sinks.sink1.serializer= org.apache.flume.sink.hbase.SimpleHbaseEventSerializer

hbase-agent.sinks.sink1.serializer.payloadColumn=coll

hbase-agent.sinks.sink1.serializer.keyType = timestamp

hbase-agent.sinks.sink1.serializer.incrementColumn=coll

hbase-agent.sinks.sink1.serializer.rowPrefix=1

hbase-agent.sinks.sink1.serializer.suffix=timestamp

hbase-agent.channels.ch1.type=memory

The alternative source exec command is as follows.

hbase-agent.sources.tail.command=cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace

/alert_ORCL.log

Running the Flume Agent

In this section we shall run the Flume agent to stream the last KB in the alert_ORCL.log file to HBase using the tail command. We shall also stream the complete alert log file alert_ORCL using the cat command. Run the Flume agent using the flume-ng shell script in which specify the agent name using the –n option, the configuration directory using the –conf option and the configuration file using the –f option. Specify the Flume logger Dflume.root.logger as INFO,console to log at INFO level to the console. Run the following command to run the Flume agent hbase-agent.

flume-ng agent --conf $FLUME_HOME/conf/ -f $FLUME_HOME/conf/flume.conf -n hbase-agent -Dflume.root.logger=INFO,console

HBase libraries get included for HBase access.

The source and sink get started.

The flume log output provides more detail of the Fume agent command.

05 Dec 2014 22:20:57,147 INFO [lifecycleSupervisor-1-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start:61) - Configuration provider starting

05 Dec 2014 22:20:57,194 INFO [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:133) - Reloading configuration file:/flume/apache-flume-1.4.0-cdh4.6.0-bin/conf/flume.conf

05 Dec 2014 22:20:57,214 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016) - Processing:sink1

(org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140) - Post-validation flume configuration contains configuration for agents: [hbase-agent]

05 Dec 2014 22:20:57,502 INFO [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:150) - Creating channels

05 Dec 2014 22:20:57,529 INFO [conf-file-poller-0] (org.apache.flume.channel.DefaultChannelFactory.create:40) - Creating instance of channel ch1 type memory

05 Dec 2014 22:20:57,543 INFO [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205) - Created channel ch1

05 Dec 2014 22:20:57,545 INFO [conf-file-poller-0] (org.apache.flume.source.DefaultSourceFactory.create:39) - Creating instance of source tail, type exec

05 Dec 2014 22:20:57,570 INFO [conf-file-poller-0] (org.apache.flume.sink.DefaultSinkFactory.create:40) - Creating instance of sink: sink1, type: org.apache.flume.sink.hbase.HBaseSink

05 Dec 2014 22:20:58,218 INFO [conf-file-poller-0] (org.apache.flume.sink.hbase.HBaseSink.configure:218) - The write to WAL option is set to: true

05 Dec 2014 22:20:58,223 INFO [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.getConfiguration:119) - Channel ch1 connected to [tail, sink1]

05 Dec 2014 22:20:58,238 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:138) - Starting new configuration:{ sourceRunners:{tail=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:tail,state:IDLE} }} sinkRunners:{sink1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@a21d88 counterGroup:{ name:null counters:{} } }} channels:{ch1=org.apache.flume.channel.MemoryChannel{name: ch1}} }

05 Dec 2014 22:20:58,240 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:145) - Starting Channel ch1

05 Dec 2014 22:20:58,372 INFO [lifecycleSupervisor-1-0] (org.apache.flume.instrumentation.MonitoredCounterGroup.register:119) - Monitored counter group for type: CHANNEL, name: ch1: Successfully registered new MBean.

05 Dec 2014 22:20:58,373 INFO [lifecycleSupervisor-1-0] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:95) - Component type: CHANNEL, name: ch1 started

05 Dec 2014 22:20:58,373 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:173) - Starting Sink sink1

05 Dec 2014 22:20:58,375 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:184) - Starting Source tail

05 Dec 2014 22:20:58,376 INFO [lifecycleSupervisor-1-3] (org.apache.flume.source.ExecSource.start:163) - Exec source starting with command:tail -F /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace

05 Dec 2014 22:20:58,396 INFO [lifecycleSupervisor-1-3] (org.apache.flume.instrumentation.MonitoredCounterGroup.register:119) - Monitored counter group for type: SOURCE, name: tail: Successfully registered new MBean.

05 Dec 2014 22:20:58,397 INFO [lifecycleSupervisor-1-3] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:95) - Component type: SOURCE, name: tail started

Scanning HBase Table

In this section we shall scan the HBase table after running the Flume agent each time; after running the tail –f command and after running the cat command. Run the following command in HBase shell to scan the HBase table flume.

scan flume

The Oracle log file data streamed into HBase gets listed.

Run the scan flume command again after running the Flume agent with the cat /home/oracle/app/oracle/diag/rdbms/orcl/ORCL/trace /alert_ORCL.log command.

More rows get listed as the complete Oracle log file is streamed.

ChannelException

If the channel capacity gets exceeded while the Flume agent is streaming events the following exception may be generated.

: java.lang.InterruptedException

org.apache.flume.ChannelException: Unable to put batch on required channel:

Caused by: org.apache.flume.ChannelException: Space for commit to queue couldn't be acquired Sinks are likely not keeping up with sources, or the buffer size is too tight

A subsequent scan of the HBase table would result in fewer rows getting listed than would get streamed if the complete log file got streamed without an exception.

To avoid the exception increase the default queue size with the following configuration property in flume.conf.

hbase-agent.channels.ch1.capacity = 100000

In this tutorial we streamed Oracle Database logs to HBase using Flume.

Streaming Oracle Database Logs to HBase with Flume

Setting the Environment

Starting HDFS

Starting HBase

Configuring Flume Agent for HBase

Running the Flume Agent

Scanning HBase Table

ChannelException

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112