博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
初识Hadoop
阅读量:5894 次
发布时间:2019-06-19

本文共 14861 字,大约阅读时间需要 49 分钟。

Hadoop

安装 Ubuntu环境

192.168.1.64 HNClient
192.168.1.65 HNName

SUSE,Ubuntu的vi不能使用退格键删除数据

删除的时候,要按ESC,再按X才能删除数据
插入数据,使用i
在当前行之下新开一行,使用o

在HNClient上操作

norman@HNClient:~$ sudo vi /etc/hostname
norman@HNClient:~$ HNClient
norman@HNClient:~$ sudo apt-get install openssh-server

norman@HNClient:~$ sudo vi /etc/hosts

192.168.1.64 HNClient
192.168.1.65 HNName

norman@HNClient:~$ ssh-keygen (下面直接默认回车)

Generating public/private rsa key pair.
Enter file in which to save the key (/home/norman/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/norman/.ssh/id_rsa.
Your public key has been saved in /home/norman/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:rj3kM5OeqxceqGP6DcofXa+hZFReLQmKqksqoYL+YH4 norman@HNClient
The key's randomart image is:
+---[RSA 2048]----+
| . |
| . . . o |
| . . . + . |
| . o . . |
| . ..S |
|.. o.o+. |
|+= o.++o+. |
|Xo.E+ +X+ |
|
oo.=+ |
+----[SHA256]-----+

norman@HNClient:~$ ssh localhost (ssh localhost,还是需要密码认证)

norman@localhost's password:
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)

  • Documentation:
  • Management:
  • Support:

251 packages can be updated.

79 updates are security updates.

New release '18.04.1 LTS' available.

Run 'do-release-upgrade' to upgrade to it.

Last login: Wed Oct 31 23:14:08 2018 from 192.168.1.65

norman@HNClient:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

norman@HNClient:~$ ssh localhost (ssh localhost,不需要密码认证了)

Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)

  • Documentation:
  • Management:
  • Support:

251 packages can be updated.

79 updates are security updates.

New release '18.04.1 LTS' available.

Run 'do-release-upgrade' to upgrade to it.

Last login: Wed Oct 31 23:18:02 2018 from 127.0.0.1

norman@HNClient:~$ ssh HNName (ssh HNName,还是需要密码认证)

norman@hnname's password:

norman@HNClient:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub norman@HNName

norman@HNClient:~$ ssh HNName (ssh HNName,不需要密码就能登陆HNName了)

Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)

  • Documentation:
  • Management:
  • Support:

254 packages can be updated.

79 updates are security updates.

New release '18.04.1 LTS' available.

Run 'do-release-upgrade' to upgrade to it.

Last login: Wed Oct 31 23:23:21 2018 from 192.168.1.64

norman@HNName:~$

在HNName上操作

norman@HNName:~$ sudo vi /etc/hosts

192.168.1.64 HNClient
192.168.1.65 HNName

norman@HNName:~$ ssh-keygen (下面直接默认回车)

Generating public/private rsa key pair.
Enter file in which to save the key (/home/norman/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/norman/.ssh/id_rsa.
Your public key has been saved in /home/norman/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:YXrPGdhKYkPsAroDlIZJ4sYdbrpHyvaMQccMV3GJn9I norman@HNName
The key's randomart image is:
+---[RSA 2048]----+
|.. . oo.. |
|.+ oo.. |
|oO.= = + |
|+.B. + E + |
|oo =. B S o |
|+.= o = + o |
|o
. . + |
|..* |
| . o |
+----[SHA256]-----+
norman@HNClient:~$ ssh localhost (ssh localhost,还是需要密码认证)
norman@localhost's password:
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)

  • Documentation:
  • Management:
  • Support:

251 packages can be updated.

79 updates are security updates.

New release '18.04.1 LTS' available.

Run 'do-release-upgrade' to upgrade to it.

Last login: Wed Oct 31 22:55:29 2018 from 127.0.0.1

norman@HNName:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

norman@HNName:~$ ssh localhost (ssh localhost,不需要密码认证了)
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)

  • Documentation:
  • Management:
  • Support:

254 packages can be updated.

79 updates are security updates.

New release '18.04.1 LTS' available.

Run 'do-release-upgrade' to upgrade to it.

Last login: Wed Oct 31 23:00:28 2018 from 127.0.0.1

norman@HNName:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub norman@hnclient

/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/norman/.ssh/id_rsa.pub"
The authenticity of host 'hnclient (192.168.1.64)' can't be established.
ECDSA key fingerprint is SHA256:w5dwBrXor00JfFtpGXc0G/+deJJwmAxKmjXE32InhgA.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
norman@hnclient's password:

Number of key(s) added: 1

Now try logging into the machine, with: "ssh 'norman@hnclient'"

and check to make sure that only the key(s) you wanted were added.

norman@HNName:~$ ssh hnclient (ssh hnclient,不需要密码就能登陆hnclient了)

Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)

  • Documentation:
  • Management:
  • Support:

251 packages can be updated.

79 updates are security updates.

New release '18.04.1 LTS' available.

Run 'do-release-upgrade' to upgrade to it.

Last login: Wed Oct 31 23:05:13 2018 from 192.168.1.58

norman@HNClient:~$ exit

norman@HNName:~$ sudo apt-get install openjdk-7-jdk

[sudo] password for norman:
Reading package lists... Done
Building dependency tree
Reading state information... Done
Package openjdk-7-jdk is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'openjdk-7-jdk' has no installation candidate

是因为Ubuntu16.04的安装源已经默认没有openjdk7了,所以要自己手动添加仓库,如下:

norman@HNName:~$ sudo add-apt-repository ppa:openjdk-r/ppa (添加oracle openjdk ppa source)( add-apt-repository ppa: xxx/ppa 这句话的意思是获取最新的个人软件包档案源,将其添加至当前apt库中,并自动导入公钥。)

norman@HNName:~$ sudo apt-get update
norman@HNName:~$ sudo apt-get install openjdk-7-jdk
norman@HNName:~$ java -version
java version "1.7.0_95"
OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-3)
OpenJDK Client VM (build 24.95-b01, mixed mode, sharing)

norman@HNName:~$ wget

norman@HNName:~$ tar -zxvf hadoop-1.2.0-bin.tar.gz
norman@HNName:~$ sudo cp -r hadoop-1.2.0 /usr/local/hadoop
norman@HNName:~$ dir /usr/local/hadoop
bin hadoop-ant-1.2.0.jar hadoop-tools-1.2.0.jar NOTICE.txt
build.xml hadoop-client-1.2.0.jar ivy README.txt
c++ hadoop-core-1.2.0.jar ivy.xml sbin
CHANGES.txt hadoop-examples-1.2.0.jar lib share
conf hadoop-minicluster-1.2.0.jar libexec src
contrib hadoop-test-1.2.0.jar LICENSE.txt webapps

norman@HNName:~$ sudo vi $HOME/.bashrc (末尾添加以下)

export HADOOP_PREFIX=/usr/local/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin

norman@HNName:~$ exec bash

norman@HNName:~$ $PATH

norman@HNName:~$ sudo vi /usr/local/hadoop/conf/hadoop-env.sh

( The java implementation to use. Required.)
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-i386

( Extra Java runtime options. Empty by default. 设置禁用IPv6)

export HADOOP_OPTS=-Djava.net.preferIP4Stack=true

Installing Apache Hadoop (Single Node)

norman@HNName:~$ sudo vi /usr/local/hadoop/conf/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://HNName:10001</value>
</property>

<property>

<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>

norman@HNName:~$ sudo vi /usr/local/hadoop/conf/mapred-site.xml

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>HNName:10002</value>
</property>
</configuration>

norman@HNName:~$ sudo mkdir /usr/local/hadoop/tmp

norman@HNName:~$ sudo chown norman /usr/local/hadoop/tmp
norman@HNName:~$ hadoop namenode -format (能看到以下说明成功)
18/11/01 19:07:36 INFO common.Storage: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted.

norman@HNName:~$ hadoop-daemons.sh start namenode (出以下错误)

localhost: mkdir: cannot create directory ?usr/local/hadoop/libexec/../logs? Permission denied
localhost: chown: cannot access '/usr/local/hadoop/libexec/../logs': No such file or directory
localhost: starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out
localhost: /usr/local/hadoop/bin/hadoop-daemon.sh: line 137: /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out: No such file or directory
localhost: head: cannot open '/usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out' for reading: No such file or directory
localhost: /usr/local/hadoop/bin/hadoop-daemon.sh: line 147: /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out: No such file or directory
localhost: /usr/local/hadoop/bin/hadoop-daemon.sh: line 148: /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out: No such file or directory

norman@HNName:~$ ll /usr/local

total 44
drwxr-xr-x 11 root root 4096 Nov 1 02:02 ./
drwxr-xr-x 11 root root 4096 Feb 28 2018 ../
drwxr-xr-x 2 root root 4096 Feb 28 2018 bin/
drwxr-xr-x 2 root root 4096 Feb 28 2018 etc/
drwxr-xr-x 2 root root 4096 Feb 28 2018 games/
drwxr-xr-x 15 root root 4096 Nov 1 20:05 hadoop/
drwxr-xr-x 2 root root 4096 Feb 28 2018 include/
drwxr-xr-x 4 root root 4096 Feb 28 2018 lib/
lrwxrwxrwx 1 root root 9 Jul 26 23:29 man -> share/man/
drwxr-xr-x 2 root root 4096 Feb 28 2018 sbin/
drwxr-xr-x 8 root root 4096 Feb 28 2018 share/
drwxr-xr-x 2 root root 4096 Feb 28 2018 src/

norman@HNName:~$ sudo chown norman /usr/local/hadoop

norman@HNName:~$ ll /usr/local

total 44
drwxr-xr-x 11 root root 4096 Nov 1 02:02 ./
drwxr-xr-x 11 root root 4096 Feb 28 2018 ../
drwxr-xr-x 2 root root 4096 Feb 28 2018 bin/
drwxr-xr-x 2 root root 4096 Feb 28 2018 etc/
drwxr-xr-x 2 root root 4096 Feb 28 2018 games/
drwxr-xr-x 15 norman root 4096 Nov 1 20:05 hadoop/
drwxr-xr-x 2 root root 4096 Feb 28 2018 include/
drwxr-xr-x 4 root root 4096 Feb 28 2018 lib/
lrwxrwxrwx 1 root root 9 Jul 26 23:29 man -> share/man/
drwxr-xr-x 2 root root 4096 Feb 28 2018 sbin/
drwxr-xr-x 8 root root 4096 Feb 28 2018 share/
drwxr-xr-x 2 root root 4096 Feb 28 2018 src/

norman@HNName:~$ hadoop-daemons.sh start namenode

localhost: starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out

norman@HNName:~$ start-all.sh

norman@HNName:~$ jps
23297 DataNode
23610 TaskTracker
23484 JobTracker
23739 Jps
23102 NameNode
23416 SecondaryNameNode

norman@HNName:~$ dir /usr/local/hadoop/bin

hadoop hadoop-daemon.sh rcc start-all.sh start-dfs.sh start-mapred.sh stop-balancer.sh stop-jobhistoryserver.sh task-controller
hadoop-config.sh hadoop-daemons.sh slaves.sh start-balancer.sh start-jobhistoryserver.sh stop-all.sh stop-dfs.sh stop-mapred.sh

Managing HDFS

(下载文本文件)
复制网页内容到war_and_peace.txt
(下载任意数据)
QCLCD201701.zip,QCLCD201702.zip,然后解压出201701hourly.txt, 201702hourly.txt

在HNClient上操作

将数据 war_and_peace.txt 放到 /home/norman/data/book
将数据201701hourly.txt,201702hourly.txt放到 /home/norman/data/weather

norman@HNClient:~$ sudo mkdir -p /home/norman/data/book

norman@HNClient:~$ sudo mkdir -p /home/norman/data/weather
norman@HNClient:~$ sudo chown norman /home/norman/data/weather
norman@HNClient:~$ sudo chown norman /home/norman/data/book

norman@HNClient:~$ sudo add-apt-repository ppa:openjdk-r/ppa

norman@HNClient:~$ sudo apt-get update
norman@HNClient:~$ sudo apt-get install openjdk-7-jdk
norman@HNClient:~$ java -version
norman@HNClient:~$ wget
norman@HNClient:~$ tar -zxvf hadoop-1.2.0-bin.tar.gz
norman@HNClient:~$ sudo cp -r hadoop-1.2.0 /usr/local/hadoop
norman@HNClient:~$ sudo vi $HOME/.bashrc (末尾添加以下)
export HADOOP_PREFIX=/usr/local/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin

norman@HNClient:~$ exec bash

norman@HNClient:~$ $PATH
norman@HNClient:~$ sudo vi /usr/local/hadoop/conf/hadoop-env.sh
(The java implementation to use. Required.)
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-i386
( Extra Java runtime options. Empty by default. 设置禁用IPv6)
export HADOOP_OPTS=-Djava.net.preferIP4Stack=true

norman@HNClient:~$ sudo vi /usr/local/hadoop/conf/core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://HNName:10001</value>
</property>

<property>

<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>

norman@HNClient:~$ sudo vi /usr/local/hadoop/conf/mapred-site.xml

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>HNName:10002</value>
</property>
</configuration>

norman@HNClient:~$ hadoop fs -mkdir test

norman@HNClient:~$ hadoop fs -ls
Found 1 items
drwxr-xr-x - norman supergroup 0 2018-11-02 01:17 /user/norman/test

norman@HNClient:~$ hadoop fs -mkdir hdfs://hnname:10001/data/small

norman@HNClient:~$ hadoop fs -mkdir hdfs://hnname:10001/data/big

网页打开http://192.168.1.65:50070

=/

norman@HNClient:~$ hadoop fs -rmr test (测试删除)

Deleted hdfs://HNName:10001/user/norman/test

norman@HNClient:~$ hadoop fs -moveFromLocal /home/norman/data/book/war_and_peace.txt hdfs://hnname:10001/data/small/war_and_peace.txt

可以看到以下数据

norman@HNClient:~$ hadoop fs -copyToLocal hdfs://hnname:10001/data/small/war_and_peace.txt /home/norman/data/book/war_and_peace.bak.txt (测试复制到本地)

norman@HNClient:~$ hadoop fs -put /home/norman/data/weather hdfs://hnname:10001/data/big

可以看到以下数据

norman@HNClient:~$ hadoop dfsadmin -report

Configured Capacity: 19033165824 (17.73 GB)
Present Capacity: 13114503168 (12.21 GB)
DFS Remaining: 12005150720 (11.18 GB)
DFS Used: 1109352448 (1.03 GB)
DFS Used%: 8.46%
Under replicated blocks: 19
Blocks with corrupt replicas: 0
Missing blocks: 0


Datanodes available: 1 (1 total, 0 dead)

Name: 192.168.1.65:50010

Decommission Status : Normal
Configured Capacity: 19033165824 (17.73 GB)
DFS Used: 1109352448 (1.03 GB)
Non DFS Used: 5918662656 (5.51 GB)
DFS Remaining: 12005150720(11.18 GB)
DFS Used%: 5.83%
DFS Remaining%: 63.07%
Last contact: Fri Nov 02 01:49:43 GMT-08:00 2018

norman@HNClient:~$ hadoop dfsadmin -safemode enter (upgrade的时候,需要用到safemode)

Safe mode is ON

norman@HNClient:~$ hadoop dfsadmin -safemode leave

Safe mode is OFF

在HNName上操作

norman@HNName:~$ hadoop fsck -blocks
Status: HEALTHY
Total size: 1100586452 B
Total dirs: 13
Total files: 4
Total blocks (validated): 19 (avg. block size 57925602 B)
Minimally replicated blocks: 19 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 19 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 38 (200.0 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Fri Nov 02 01:54:46 GMT-08:00 2018 in 1049 milliseconds

The filesystem under path '/' is HEALTHY

norman@HNName:~$ hadoop fsck /data/big

Status: HEALTHY
Total size: 1097339705 B
Total dirs: 2
Total files: 2
Total blocks (validated): 17 (avg. block size 64549394 B)
Minimally replicated blocks: 17 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 17 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 34 (200.0 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Fri Nov 02 19:33:55 GMT-08:00 2018 in 14 milliseconds

The filesystem under path '/data/big' is HEALTHY

转载于:https://blog.51cto.com/2290153/2312517

你可能感兴趣的文章
视频直播点播nginx-rtmp开发手册中文版
查看>>
PHP队列的实现
查看>>
单点登录加验证码例子
查看>>
[T-SQL]从变量与数据类型说起
查看>>
occActiveX - ActiveX with OpenCASCADE
查看>>
BeanUtils\DBUtils
查看>>
python模块--os模块
查看>>
linux下单节点oracle数据库间ogg搭建
查看>>
Java 数组在内存中的结构
查看>>
《关爱码农成长计划》第一期报告
查看>>
学习进度表 04
查看>>
谈谈javascript中的prototype与继承
查看>>
时序约束优先级_Vivado工程经验与各种时序约束技巧分享
查看>>
minio 并发数_MinIO 参数解析与限制
查看>>
flash back mysql_mysqlbinlog flashback 使用最佳实践
查看>>
mysql存储引擎模式_MySQL存储引擎
查看>>
python类 del_全面了解Python类的内置方法
查看>>
java jni 原理_使用JNI技术实现Java和C++的交互
查看>>
java 重写system.out_重写System.out.println(String x)方法
查看>>
mysql client命令行选项
查看>>