Hadoop部署

Hadoop的部署步骤(暂时仅有win版单机部署)
首先声明:对于大多数错误,首先确认JDK和Hadoop版本,然后才是winutils版本、配置信息。我默认你能看懂输出和报错。

下载JDK和Hadoop

本文使用JDK11和Hadoop3.3.5,后面不再说明。之前使用了JDK17,但是在运行时报错很多,最后卡在了这里无法解决:

 ERROR nodemanager.NodeManager: Error starting NodeManager
java.lang.ExceptionInInitializerError
        at com.google.inject.internal.cglib.reflect.$FastClassEmitter.<init>(FastClassEmitter.java:67)
        at com.google.inject.internal.cglib.reflect.$FastClass$Generator.generateClass(FastClass.java:72)
        at com.google.inject.internal.cglib.core.$DefaultGeneratorStrategy.generate(DefaultGeneratorStrategy.java:25)
        at com.google.inject.internal.cglib.core.$AbstractClassGenerator.create(AbstractClassGenerator.java:216)
        at com.google.inject.internal.cglib.reflect.$FastClass$Generator.create(FastClass.java:64)
        at com.google.inject.internal.BytecodeGen.newFastClass(BytecodeGen.java:204)
        at com.google.inject.internal.ProviderMethod$FastClassProviderMethod.<init>(ProviderMethod.java:256)
        at com.google.inject.internal.ProviderMethod.create(ProviderMethod.java:71)
        at com.google.inject.internal.ProviderMethodsModule.createProviderMethod(ProviderMethodsModule.java:275)
        at com.google.inject.internal.ProviderMethodsModule.getProviderMethods(ProviderMethodsModule.java:144)
        at com.google.inject.internal.ProviderMethodsModule.configure(ProviderMethodsModule.java:123)
        at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:340)
        at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:349)
        at com.google.inject.AbstractModule.install(AbstractModule.java:122)
        at com.google.inject.servlet.ServletModule.configure(ServletModule.java:52)
        at com.google.inject.AbstractModule.configure(AbstractModule.java:62)
        at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:340)
        at com.google.inject.spi.Elements.getElements(Elements.java:110)
        at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:138)
        at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:104)
        at com.google.inject.Guice.createInjector(Guice.java:96)
        at com.google.inject.Guice.createInjector(Guice.java:73)
        at com.google.inject.Guice.createInjector(Guice.java:62)
        at org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:417)
        at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:465)
        at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:461)
        at org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer.serviceStart(WebServer.java:125)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
        at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:122)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:963)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1042)
Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make protected final java.lang.Class java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain) throws java.lang.ClassFormatError accessible: module java.base does not "opens java.lang" to unnamed module @6025e1b6
        at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
        at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
        at java.base/java.lang.reflect.Method.checkCanSetAccessible(Method.java:199)
        at java.base/java.lang.reflect.Method.setAccessible(Method.java:193)
        at com.google.inject.internal.cglib.core.$ReflectUtils$2.run(ReflectUtils.java:56)
        at java.base/java.security.AccessController.doPrivileged(AccessController.java:318)
        at com.google.inject.internal.cglib.core.$ReflectUtils.<clinit>(ReflectUtils.java:46)
        ... 32 more
2023-10-12 15:56:20,882 INFO ipc.Server: Stopping server on 8122
2023-10-12 15:56:20,882 INFO ipc.Server: Stopping IPC Server listener on 0
2023-10-12 15:56:20,882 INFO ipc.Server: Stopping IPC Server Responder
2023-10-12 15:56:20,883 WARN monitor.ContainersMonitorImpl: org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting.
2023-10-12 15:56:20,889 INFO ipc.Server: Stopping server on 8040
2023-10-12 15:56:20,890 INFO ipc.Server: Stopping IPC Server listener on 8040
2023-10-12 15:56:20,890 INFO ipc.Server: Stopping IPC Server Responder
2023-10-12 15:56:20,890 WARN nodemanager.NodeResourceMonitorImpl: org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl is interrupted. Exiting.
2023-10-12 15:56:20,890 INFO localizer.ResourceLocalizationService: Public cache exiting
2023-10-12 15:56:20,891 INFO impl.MetricsSystemImpl: Stopping NodeManager metrics system...
2023-10-12 15:56:20,892 INFO impl.MetricsSystemImpl: NodeManager metrics system stopped.
2023-10-12 15:56:20,892 INFO impl.MetricsSystemImpl: NodeManager metrics system shutdown complete.
2023-10-12 15:56:20,893 INFO nodemanager.NodeManager: SHUTDOWN_MSG:

大概就是guice包用到的反射机制在JDK17被限制了,我尝试添加jvm参数:--add-opens java.base/java.lang=ALL-UNNAMED也没用,同时JDK17也取消了--illegal-access=permit的使用。目前我暂时没找到如何在JDK17上运行Hadoop。

下载winutils

下载对应自己Hadoop版本的winutils

添加环境变量

添加HADOOP_HOME和HADOOP_SUER_NAME配置

  • HADOOP_NAME=“你的Hadoop存放目录,例如D:/envs/hadoop-3.3.5”
  • HADOOP_SUER_NAME=“root”
  • JAVA_HOME=“你的JDK路径”,如果你本来就有JDK,但是版本是17这些高版本,也不需要修改,具体看后面操作。

在path中添加:

  • %JAVA_HOME%\bin
  • %JAVA_HOME%\jre\bin
  • %HADOOP_HOME%\bin
  • %HADOOP_HOME%\sbin

检测是否安装成功

命令行运行如下命令:

java -version
hadoop version

应当形如:

修改一些Hadoop配置文件

etc/hadoop文件夹中

core-site.xml

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <!-- 使用localhost来指定你的本机 -->
    <value>hdfs://localhost:9000</value>
  </property>

  <property>
    <name>hadoop.tmp.dir</name>
    <!-- 保证此目录存在且具有适当的权限 -->
    <value>/D:/envs/hadoop-3.3.5/tmp</value>
  </property>

  <property>
    <name>hadoop.http.staticuser.user</name>
    <!-- 使用root可能不是最佳做法,除非你有明确的原因这样做。考虑使用普通用户帐号 -->
    <value>root</value>
  </property>
</configuration>

mapreduce-site.xml

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

hdfs-site.xml

在Hadoop目录下创建data目录
在刚才创建的data目录下创建namenode目录

<configuration>
  <!-- 这个参数设置为1,因为是单机版hadoop -->
  <property>
    <name>dfs.replication </name>
    <value>1</value>
  </property>
  <property>
    <name>dfs.permissions</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>/D:/envs/hadoop-3.3.5/data/namenode</value>
  </property>
  <property>
    <name>fs.checkpoint.dir</name>
    <value>/D:/envs/hadoop-3.3.5/data/snn</value>
  </property>
  <property>
    <name>fs.checkpoint.edits.dir</name>
    <value>/D:/envs/hadoop-3.3.5/data/snn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>/D:/envs/hadoop-3.3.5/data/datanode</value>
  </property>
</configuration>

yarn-site.xml

<configuration>

  <!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
</configuration>

hadoop-env.cmd

找到set JAVA_HOME=%JAVA_HOME一行,如果你本来就是JAVA8或者11,不需要改动,如果是17这类,修改为刚才下载的JDK11路径,例如:set JAVA_HOME=%C:\Java\jdk-11%

使用winutils替换bin目录

复制粘贴就行,版本对上就ok

格式化存储目录

命令行输入:
hdfs namenode -format
至此,应当完成配置了

Hadoop,启动

命令行输入start-all.cmd,通过jps应当可以查看到这几个服务:

服务管理

查看集群节点状态

http://127.0.0.1:8088/

查看文件管理页面

http://127.0.0.1:9870/