您好,欢迎来到花图问答。
搜索
您的当前位置:首页使用Sqoop从MySQL导入数据到HBase和hive

使用Sqoop从MySQL导入数据到HBase和hive

来源:花图问答

有时候需要将mysql的全量数据导入到hive或者hbase中,使用sqoop是一个比较好用的工具,速度相对来说比较快。mysql的增量数据在用其他方法实时同步。

一、mysql同步到hbase

执行命令部分日志:

[hdfs@slave1 ~]$ sqoop import --connect jdbc:mysql://xxx.xxx.xxx.xxx:3306/database --table tablename --hbase-table hbasetablename --column-family family --hbase-row-key ID --hbase-create-table --username 'root' -P
Warning: /soft/bigdata/clouderamanager/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/04/28 15:54:37 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.10.0
Enter password: 
17/04/28 15:54:44 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/04/28 15:54:44 INFO tool.CodeGenTool: Beginning code generation
Fri Apr 28 15:54:44 CST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
17/04/28 15:54:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `COMP_DICT` AS t LIMIT 1
17/04/28 15:54:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `COMP_DICT` AS t LIMIT 1
17/04/28 15:54:45 INFO  HADOOP_MAPRED_HOME is /soft/bigdata/clouderamanager/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-hdfs/compile/f5c3b693ffb26b66c554308ad32b2880/COMP_DICT.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/04/28 15:54:47 INFO  Writing jar file: /tmp/sqoop-hdfs/compile/f5c3b693ffb26b66c554308ad32b2880/COMP_DICT.jar
……
7/04/28 15:54:53 INFO mapreduce.Job: The url to track the job: http://master2:8088/proxy/application_1491881598805_0027/
17/04/28 15:54:53 INFO mapreduce.Job: Running job: job_1491881598805_0027
17/04/28 15:54:59 INFO mapreduce.Job: Job job_1491881598805_0027 running in uber mode : false
17/04/28 15:54:59 INFO mapreduce.Job:  map 0% reduce 0%
17/04/28 15:55:05 INFO mapreduce.Job:  map 20% reduce 0%
17/04/28 15:55:06 INFO mapreduce.Job:  map 60% reduce 0%
17/04/28 15:55:09 INFO mapreduce.Job:  map 100% reduce 0%
17/04/28 15:55:10 INFO mapreduce.Job: Job job_1491881598805_0027 completed successfully
17/04/28 15:55:10 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=925010
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=665
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=5
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
    Job Counters 
        Launched map tasks=5
        Other local map tasks=5
        Total time spent by all maps in occupied slots (ms)=25663
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=25663
        Total vcore-seconds taken by all map tasks=25663
        Total megabyte-seconds taken by all map tasks=26278912
    Map-Reduce Framework
        Map input records=10353
        Map output records=10353
        Input split bytes=665
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=586
        CPU time spent (ms)=17940
        Physical memory (bytes) snapshot=1619959808
        Virtual memory (bytes) snapshot=14046998528
        Total committed heap usage (bytes)=1686634496
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=0
17/04/28 15:55:10 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 20.0424 seconds (0 bytes/sec)
17/04/28 15:55:10 INFO mapreduce.ImportJobBase: Retrieved 10353 records.

PS:如果想把表中的主键也增加到hbase的cell中也就是列族中的列的话,需要设置如下参数:
sqoop.hbase.add.row.key=true

sqoop import -D sqoop.hbase.add.row.key=true --connect jdbc:mysql://xxx.xxx.xxx.xxx:3306/database --table tablename --hbase-table hbasetablename --column-family family --hbase-row-key "etl_date,APPLY_ID" --hbase-create-table --username 'root' -P

二、mysql同步到hive

1、创建与mysql相同结构的表:

2、mysql数据导入到hive中

Copyright © 2019- huatuowenda.com 版权所有 湘ICP备2023022495号-1

违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务