Thursday, January 21, 2016

[Hadoop] Importing/Exporting Database MySQL <==> HDFS using Sqoop


  • Import all MySQL tables from "training" database  into the HDFS directory "hdfsdir"
$]sqoop import-all-tables --connect jdbc:mysql://localhost/training --username training --password training --warehouse-dir hdfsdir

  • Loginto the hive shell
  • create the external table(without location) with the schema as of the MySQL database table "movies"
hive> external table movies(eid int, ename String, code String) row format delimited fields terminated by ',';
  • Load data from the hdfs location into the table movies
hive>load data inpath 'hdfs://localhost/user/training/hdfsdir/Movies/part-*' into table movies;
  • Run Hive Queries

hive>select  eid, name, codefrom movies where eid >1000;
  • Overwrite table "movies" with a filter, this will delete "movies" directory in "/user/hive/warehouse" and recreates the directory "/user/hive/warehouse/movies" with a file 0000000_0. Contents of this file would the records of the filter
hive> insert overwrite table movies select * from movies where eid>1670; 


  • Create a table in MySQL with the same scheme

mysql>  CREATE TABLE movies1 ( eid int(11) NOT NULL, ename char(255), code char(255));


  • export data from the HIVE/HDFS into the MySQL
$] sqoop export --connect jdbc:mysql://localhost/training --username training --password training --table movies1 -m 4 --export-dir /user/hive/warehouse/movies



Wednesday, January 13, 2016

BASH Script to Get the Child Process IDs recursively.

#!/bin/bash
function return_child_pid
{
   pd=$1
   count=0
   for p in `ps -ef | awk '$3=="'"$pd"'" {print $2}'`;
   do
      count=$(($count+1))
      return_child_pid $p
   done
   if [ $count == 0 ]; then
     echo $1
     return $1
   fi
}

return_child_pid $1