javascript

git

java

python

loops

github

django

c#

reactjs

git-remote

json

function

scope

jquery

iteration

namespaces

using-directives

file-io

iterator

yield

java-级联示例无法编译?

在外壳程序中,我在Im Patient / part1目录中键入了gradle cleanJar。 输出如下。 错误是“找不到org.apache.hadoop.mapred.JobConf的类文件”。 为什么无法编译?

:clean UP-TO-DATE
:compileJava
Download http://conjars.org/repo/cascading/cascading-core/2.0.1/cascading-core-2.0.1.pom
Download http://conjars.org/repo/cascading/cascading-hadoop/2.0.1/cascading-hadoop-2.0.1.pom
Download http://conjars.org/repo/riffle/riffle/0.1-dev/riffle-0.1-dev.pom
Download http://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.6.1/slf4j-api-1.6.1.pom
Download http://repo1.maven.org/maven2/org/slf4j/slf4j-parent/1.6.1/slf4j-parent-1.6.1.pom
Download http://repo1.maven.org/maven2/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.pom
Download http://conjars.org/repo/thirdparty/jgrapht-jdk1.6/0.8.1/jgrapht-jdk1.6-0.8.1.pom
Download http://repo1.maven.org/maven2/org/codehaus/janino/janino/2.5.16/janino-2.5.16.pom
Download http://conjars.org/repo/cascading/cascading-core/2.0.1/cascading-core-2.0.1.jar
Download http://conjars.org/repo/cascading/cascading-hadoop/2.0.1/cascading-hadoop-2.0.1.jar
Download http://conjars.org/repo/riffle/riffle/0.1-dev/riffle-0.1-dev.jar
Download http://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.6.1/slf4j-api-1.6.1.jar
Download http://repo1.maven.org/maven2/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar
Download http://conjars.org/repo/thirdparty/jgrapht-jdk1.6/0.8.1/jgrapht-jdk1.6-0.8.1.jar
Download http://repo1.maven.org/maven2/org/codehaus/janino/janino/2.5.16/janino-2.5.16.jar
/home/is_admin/lab/cascading/Impatient/part1/src/main/java/impatient/Main.java:50: error: cannot access JobConf
    Tap inTap = new Hfs( new TextDelimited( true, "\t" ), inPath );
                ^
  class file for org.apache.hadoop.mapred.JobConf not found
1 error
:compileJava FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':compileJava'.
> Compilation failed; see the compiler error output for details.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Total time: 22.599 secs
trans by 2020-08-12T03:35:32Z

hadoop-如何将数据从Spark SQL导出到CSV

此命令适用于HiveQL:

insert overwrite directory '/data/home.csv' select * from testtable;

但是使用Spark SQL时,我得到了org.apache.spark.sql.hive.HiveQl堆栈跟踪错误:

java.lang.RuntimeException: Unsupported language features in query:
    insert overwrite directory '/data/home.csv' select * from testtable

请指导我在Spark SQL中编写导出到CSV功能。

trans by 2020-08-06T18:40:50Z

Hadoop集群设置-java.net.ConnectException:连接被拒绝

我想在伪分布式模式下设置hadoop-cluster。 我设法执行了所有设置步骤,包括在计算机上启动Namenode,Datanode,Jobtracker和Tasktracker。

然后,我尝试运行一些示例程序,并遇到telnet localhost 9000错误。 我回到了以独立模式运行某些操作的第一步,并且遇到了同样的问题。

我什至对所有安装步骤进行了三遍检查,却不知道如何解决。 (我是Hadoop的新手,也是Ubuntu的新手,因此,请提供任何指导或技巧来“考虑”)。

这是我一直收到的错误输出:

hduser@marta-komputer:/usr/local/hadoop$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep input output 'dfs[a-z.]+'
15/02/22 18:23:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/02/22 18:23:04 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
java.net.ConnectException: Call From marta-komputer/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
    at org.apache.hadoop.ipc.Client.call(Client.java:1472)
    at org.apache.hadoop.ipc.Client.call(Client.java:1399)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
    at com.sun.proxy.$Proxy9.delete(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:521)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy10.delete(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1929)
    at org.apache.hadoop.hdfs.DistributedFileSystem$12.doCall(DistributedFileSystem.java:638)
    at org.apache.hadoop.hdfs.DistributedFileSystem$12.doCall(DistributedFileSystem.java:634)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:634)
    at org.apache.hadoop.examples.Grep.run(Grep.java:95)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.examples.Grep.main(Grep.java:101)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
    at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
    at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
    at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
    at org.apache.hadoop.ipc.Client.call(Client.java:1438)
    ... 32 more

etc / hadoop / hadoop-env.sh文件:

# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/java-8-oracle

# The jsvc implementation to use. Jsvc is required to run secure datanodes
# that bind to privileged ports to provide authentication of data transfer
# protocol.  Jsvc is not required if SASL is configured for authentication of
# data transfer protocol using non-privileged ports.
#export JSVC_HOME=${JSVC_HOME}

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}

# Extra Java CLASSPATH elements.  Automatically insert capacity-scheduler.
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
  if [ "$HADOOP_CLASSPATH" ]; then
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
  else
    export HADOOP_CLASSPATH=$f
  fi
done

# The maximum amount of heap to use, in MB. Default is 1000.
#export HADOOP_HEAPSIZE=
#export HADOOP_NAMENODE_INIT_HEAPSIZE=""

# Extra Java runtime options.  Empty by default.
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"

export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"

export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS"
export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"

# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
#HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS"

# On secure datanodes, user to run the datanode as after dropping privileges.
# This **MUST** be uncommented to enable secure HDFS if using privileged ports
# to provide authentication of data transfer protocol.  This **MUST NOT** be
# defined if SASL is configured for authentication of data transfer protocol
# using non-privileged ports.
export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}

# Where log files are stored.  $HADOOP_HOME/logs by default.
#export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER

# Where log files are stored in the secure data environment.
export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}

# HDFS Mover specific parameters
###
# Specify the JVM options to be used when starting the HDFS Mover.
# These options will be appended to the options specified as HADOOP_OPTS
# and therefore may override any similar flags set in HADOOP_OPTS
#
# export HADOOP_MOVER_OPTS=""

###
# Advanced Users Only!
###

# The directory where pid files are stored. /tmp by default.
# NOTE: this should be set to a directory that can only be written to by 
#       the user that will run the hadoop daemons.  Otherwise there is the
#       potential for a symlink attack.
export HADOOP_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}

# A string representing this instance of hadoop. $USER by default.
export HADOOP_IDENT_STRING=$USER

.bashrc文件与Hadoop相关的片段:

# -- HADOOP ENVIRONMENT VARIABLES START -- #
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
# -- HADOOP ENVIRONMENT VARIABLES END -- #

/usr/local/hadoop/etc/hadoop/core-site.xml文件:

<configuration>

<property>
  <name>hadoop.tmp.dir</name>
  <value>/usr/local/hadoop_tmp</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

</configuration>

/usr/local/hadoop/etc/hadoop/hdfs-site.xml文件:

<configuration>
<property>
      <name>dfs.replication</name>
      <value>1</value>
 </property>
 <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:/usr/local/hadoop_tmp/hdfs/namenode</value>
 </property>
 <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:/usr/local/hadoop_tmp/hdfs/datanode</value>
 </property>
</configuration>

/usr/local/hadoop/etc/hadoop/yarn-site.xml文件:

<configuration> 
<property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
</property>
<property>
      <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

/usr/local/hadoop/etc/hadoop/mapred-site.xml文件:

<configuration>
<property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
</property>
<configuration>

运行telnet localhost 9000会产生如下输出(我用yarn替换了其中的一部分):

hduser@marta-komputer:/usr/local/hadoop$ bin/hdfs namenode -format
15/02/22 18:50:47 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = marta-komputer/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.6.0
STARTUP_MSG:   classpath = /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/htrace-core-3.0.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-cli (...)2.6.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.6.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.6.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.6.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.6.0.jar:/usr/local/hadoop/contrib/capacity-scheduler/*.jar
STARTUP_MSG:   build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1; compiled by 'jenkins' on 2014-11-13T21:10Z
STARTUP_MSG:   java = 1.8.0_31
************************************************************/
15/02/22 18:50:47 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/02/22 18:50:47 INFO namenode.NameNode: createNameNode [-format]
15/02/22 18:50:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-0b65621a-eab3-47a4-bfd0-62b5596a940c
15/02/22 18:50:48 INFO namenode.FSNamesystem: No KeyProvider found.
15/02/22 18:50:48 INFO namenode.FSNamesystem: fsLock is fair:true
15/02/22 18:50:48 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
15/02/22 18:50:48 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
15/02/22 18:50:48 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
15/02/22 18:50:48 INFO blockmanagement.BlockManager: The block deletion will start around 2015 Feb 22 18:50:48
15/02/22 18:50:48 INFO util.GSet: Computing capacity for map BlocksMap
15/02/22 18:50:48 INFO util.GSet: VM type       = 64-bit
15/02/22 18:50:48 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
15/02/22 18:50:48 INFO util.GSet: capacity      = 2^21 = 2097152 entries
15/02/22 18:50:48 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
15/02/22 18:50:48 INFO blockmanagement.BlockManager: defaultReplication         = 1
15/02/22 18:50:48 INFO blockmanagement.BlockManager: maxReplication             = 512
15/02/22 18:50:48 INFO blockmanagement.BlockManager: minReplication             = 1
15/02/22 18:50:48 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
15/02/22 18:50:48 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
15/02/22 18:50:48 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
15/02/22 18:50:48 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
15/02/22 18:50:48 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
15/02/22 18:50:48 INFO namenode.FSNamesystem: fsOwner             = hduser (auth:SIMPLE)
15/02/22 18:50:48 INFO namenode.FSNamesystem: supergroup          = supergroup
15/02/22 18:50:48 INFO namenode.FSNamesystem: isPermissionEnabled = true
15/02/22 18:50:48 INFO namenode.FSNamesystem: HA Enabled: false
15/02/22 18:50:48 INFO namenode.FSNamesystem: Append Enabled: true
15/02/22 18:50:48 INFO util.GSet: Computing capacity for map INodeMap
15/02/22 18:50:48 INFO util.GSet: VM type       = 64-bit
15/02/22 18:50:48 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
15/02/22 18:50:48 INFO util.GSet: capacity      = 2^20 = 1048576 entries
15/02/22 18:50:48 INFO namenode.NameNode: Caching file names occuring more than 10 times
15/02/22 18:50:48 INFO util.GSet: Computing capacity for map cachedBlocks
15/02/22 18:50:48 INFO util.GSet: VM type       = 64-bit
15/02/22 18:50:48 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
15/02/22 18:50:48 INFO util.GSet: capacity      = 2^18 = 262144 entries
15/02/22 18:50:48 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
15/02/22 18:50:48 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
15/02/22 18:50:48 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
15/02/22 18:50:48 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
15/02/22 18:50:48 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
15/02/22 18:50:48 INFO util.GSet: Computing capacity for map NameNodeRetryCache
15/02/22 18:50:48 INFO util.GSet: VM type       = 64-bit
15/02/22 18:50:48 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
15/02/22 18:50:48 INFO util.GSet: capacity      = 2^15 = 32768 entries
15/02/22 18:50:48 INFO namenode.NNConf: ACLs enabled? false
15/02/22 18:50:48 INFO namenode.NNConf: XAttrs enabled? true
15/02/22 18:50:48 INFO namenode.NNConf: Maximum size of an xattr: 16384
Re-format filesystem in Storage Directory /usr/local/hadoop_tmp/hdfs/namenode ? (Y or N) Y
15/02/22 18:50:50 INFO namenode.FSImage: Allocated new BlockPoolId: BP-948369552-127.0.1.1-1424627450316
15/02/22 18:50:50 INFO common.Storage: Storage directory /usr/local/hadoop_tmp/hdfs/namenode has been successfully formatted.
15/02/22 18:50:50 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/02/22 18:50:50 INFO util.ExitUtil: Exiting with status 0
15/02/22 18:50:50 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at marta-komputer/127.0.1.1
************************************************************/

启动telnet localhost 9000yarn将产生以下输出:

hduser@marta-komputer:/usr/local/hadoop$ start-dfs.sh
15/02/22 18:53:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-marta-komputer.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-marta-komputer.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-marta-komputer.out
15/02/22 18:53:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hduser@marta-komputer:/usr/local/hadoop$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-marta-komputer.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-marta-komputer.out

之后不久致电telnet localhost 9000,您会得到:

hduser@marta-komputer:/usr/local/hadoop$ jps
11696 ResourceManager
11842 NodeManager
11171 NameNode
11523 SecondaryNameNode
12167 Jps

netstat输出:

hduser@marta-komputer:/usr/local/hadoop$ sudo netstat -lpten | grep java
tcp        0      0 0.0.0.0:8088            0.0.0.0:*               LISTEN      1001       690283      11696/java      
tcp        0      0 0.0.0.0:42745           0.0.0.0:*               LISTEN      1001       684574      11842/java      
tcp        0      0 0.0.0.0:13562           0.0.0.0:*               LISTEN      1001       680955      11842/java      
tcp        0      0 0.0.0.0:8030            0.0.0.0:*               LISTEN      1001       684531      11696/java      
tcp        0      0 0.0.0.0:8031            0.0.0.0:*               LISTEN      1001       684524      11696/java      
tcp        0      0 0.0.0.0:8032            0.0.0.0:*               LISTEN      1001       680879      11696/java      
tcp        0      0 0.0.0.0:8033            0.0.0.0:*               LISTEN      1001       687392      11696/java      
tcp        0      0 0.0.0.0:8040            0.0.0.0:*               LISTEN      1001       680951      11842/java      
tcp        0      0 127.0.0.1:9000          0.0.0.0:*               LISTEN      1001       687242      11171/java      
tcp        0      0 0.0.0.0:8042            0.0.0.0:*               LISTEN      1001       680956      11842/java      
tcp        0      0 0.0.0.0:50090           0.0.0.0:*               LISTEN      1001       690252      11523/java      
tcp        0      0 0.0.0.0:50070           0.0.0.0:*               LISTEN      1001       687239      11171/java  

/ etc / hosts文件:

127.0.0.1       localhost
127.0.1.1       marta-komputer

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

====================================================

更新1。

我更新了core-site.xml,现在有了:

<property>
<name>fs.default.name</name>
<value>hdfs://marta-komputer:9000</value>
</property>

但我一直收到错误-现在开始为:

15/03/01 00:59:34 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
java.net.ConnectException: Call From marta-komputer.home/192.168.1.8 to marta-komputer:9000 failed on connection exception:     java.net.ConnectException: Connection refused; For more details see:    http://wiki.apache.org/hadoop/ConnectionRefused

我还注意到telnet localhost 9000无法正常工作:

hduser@marta-komputer:~$ telnet localhost 9000
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused
trans by 2020-07-21T17:09:30Z

hadoop-如何通过Python访问Hive?

[https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-Python]似乎已过时。

当我将其添加到/ etc / profile中时:

export PYTHONPATH=$PYTHONPATH:/usr/lib/hive/lib/py

然后,我可以执行链接中列出的导入,但from hive import ThriftHive实际需要为:

from hive_service import ThriftHive

接下来,示例中的端口是10000,当我尝试该端口时导致程序挂起。 默认的Hive Thrift端口为9083,这会停止挂起。

因此,我将其设置为:

from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
try:
    transport = TSocket.TSocket('<node-with-metastore>', 9083)
    transport = TTransport.TBufferedTransport(transport)
    protocol = TBinaryProtocol.TBinaryProtocol(transport)
    client = ThriftHive.Client(protocol)
    transport.open()
    client.execute("CREATE TABLE test(c1 int)")

    transport.close()
except Thrift.TException, tx:
    print '%s' % (tx.message)

我收到以下错误:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/hive/lib/py/hive_service/ThriftHive.py", line 68, in execute
self.recv_execute()
File "/usr/lib/hive/lib/py/hive_service/ThriftHive.py", line 84, in recv_execute
raise x
thrift.Thrift.TApplicationException: Invalid method name: 'execute'

但是检查ThriftHive.py文件会发现该方法在Client类中执行。

如何使用Python访问Hive?

trans by 2020-07-18T04:51:37Z

使用mapred或mapreduce软件包创建Hadoop Job更好吗?

要创建MapReduce作业,您可以使用旧的2962159213313319029760程序包,也可以使用较新的296215921331902929761的Mappers和Reducers,Jobs程序包。 现在,我想知道使用旧的mapred包还是新的mapreduce包创建作业更好,为什么? 还是仅取决于您是否需要诸如只能在旧的mapred包中使用的MultipleTextOutputFormat之类的东西?

trans by 2020-07-11T19:35:53Z

如何使用hadoop fs -copyToLocal命令覆盖现有文件

HDFS复制时,有什么方法可以覆盖现有文件:

hadoop fs -copyToLocal <HDFS PATH> <local path>
trans by 2020-07-01T12:02:18Z

hadoop-如何在不删除源文件的情况下从HDFS加载数据到配置单元?

将数据从HDFS加载到Hive时,使用

LOAD DATA INPATH 'hdfs_file' INTO TABLE tablename;

命令,看起来好像将hdfs_file移动到hive/warehouse dir。是否可以复制它而不是移动它,以便该文件供其他进程使用(如何?)。

trans by 2020-06-27T23:46:58Z

hadoop-如何检查Spark版本

我想检查cdh 5.7.0中的spark版本。 我已经在互联网上搜索了,但无法理解。 请帮忙。

谢谢

trans by 2020-06-24T12:48:40Z

hadoop-Hive加载带引号的字段中带有逗号的CSV

我正在尝试将CSV文件加载到Hive表中,如下所示:

CREATE TABLE mytable
(
num1 INT,
text1 STRING,
num2 INT,
text2 STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";

LOAD DATA LOCAL INPATH '/data.csv'
OVERWRITE INTO TABLE mytable;    


csv以逗号(,)分隔,如下所示:

1, "some text, with comma in it", 123, "more text"

由于第一个字符串中有一个',',这将返回损坏的数据。
有没有办法设置文本定界符或使Hive忽略字符串中的','?

我无法更改csv的定界符,因为它是从外部来源获取的。

trans by 2020-06-23T22:25:19Z

hadoop-Hbase快速计算行数

现在我像这样实现超过ResultScanner的行数

for (Result rs = scanner.next(); rs != null; rs = scanner.next()) {
    number++;
}

如果数据达到数百万次,则计算量很大。我想实时计算我不想使用Mapreduce

如何快速计算行数。

trans by 2020-02-16T15:16:49Z

Apache Hadoop YARN中的“ mapreduce.map.memory.mb”和“ mapred.map.child.java.opts”之间是什么关系?

我想知道mapreduce.map.memory.mbmapred.map.child.java.opts参数之间的关系。

mapreduce.map.memory.mb> mapred.map.child.java.opts吗?

谢谢,基瓦尔。

trans by 2020-02-11T20:21:34Z

hadoop-如何从Apache Spark访问s3 [a://]文件?

Hadoop 2.6不立即支持s3a,因此我尝试了一系列解决方案和修复,包括:

使用hadoop-aws和aws-java-sdk进行部署=>无法读取环境变量的凭据将hadoop-aws添加到maven =>各种传递依赖项冲突

有人成功完成了这两项工作吗?

trans by 2020-02-05T23:14:43Z

hadoop-用J在hdfs中写入文件

我想在HDFS中创建文件并在其中写入数据。 我使用以下代码:

Configuration config = new Configuration();     
FileSystem fs = FileSystem.get(config); 
Path filenamePath = new Path("input.txt");  
try {
    if (fs.exists(filenamePath)) {
        fs.delete(filenamePath, true);
    }

    FSDataOutputStream fin = fs.create(filenamePath);
    fin.writeUTF("hello");
    fin.close();
}

它创建文件,但不写任何东西。 我搜了很多,但是什么都没找到 我怎么了 我是否需要任何权限才能在HDFS中写入?

谢谢。

trans by 2020-01-27T23:27:18Z

hadoop-PIG如何计算别名中的行数

我做了这样的事情来计算PIG中别名的行数:

logs = LOAD 'log'
logs_w_one = foreach logs generate 1 as one;
logs_group = group logs_w_one all;
logs_count = foreach logs_group generate SUM(logs_w_one.one);
dump logs_count;

这似乎效率太低。 如果有更好的方法,请赐教!

trans by 2020-01-27T09:18:01Z

如何杀死Hadoop工作

当我的代码遇到未处理的异常时,我想自动杀死所有hadoop作业。 我想知道什么是最佳做法?

谢谢

trans by 2020-01-22T04:32:31Z

hdfs-hadoop fs -put和hadoop fs -copyFromL之间的区别

-get-copyToLocal被证明是相同的,而大多数示例都使用详细的变体-copyFromLocal。 为什么?

-get-copyToLocal也是如此

trans by 2020-01-21T23:49:24Z

hadoop mapreduce框架将System.out.print()语句发送到哪里? (标准输出)

我想调试一个mapreduce脚本,并且尝试在程序中放入一些打印语句不会造成太大麻烦。 但我似乎无法在任何日志中找到它们。

trans by 2020-01-15T22:37:40Z

hadoop-Hive群集按vs顺序按vs排序依据

据我所理解;

  • 仅在减速器中进行排序

  • 在全球范围内逐个订购商品,但将所有物品都推入一个减速器中

  • cluster by通过密钥哈希将物料智能地分配到reducer中,并通过

所以我的问题是通过确保全球秩序来集群吗? 通过将相同的键放入相同的化简器进行分配,但是相邻的键呢?

我在此可以找到的唯一文档在这里,从示例看来,它似乎是在全球范围内订购它们的。 但是根据定义,我觉得它并不总是这样做。

trans by 2020-01-15T17:01:30Z

hadoop-Namenode尚未开始

我以伪分布式模式使用Hadoop,并且一切正常。 但是由于某种原因,我不得不重新启动计算机。 现在,当我尝试启动Namenode和Datanode时,我只能找到正在运行的Datanode。 谁能告诉我这个问题的可能原因? 还是我做错了什么?

我尝试了bin/start-all.shbin/start-dfs.sh

trans by 2020-01-14T22:03:07Z

Hadoop上的Java与Python

我正在使用Hadoop进行一个项目,它似乎是在本机中合并Java并提供对Python的流支持。 选择其中一项会对性能产生重大影响吗? 我已经足够早了,如果一种方法或另一种方法存在明显的性能差异,我可以选择其中一种方法。

trans by 2020-01-14T04:16:14Z

1 2 3 4 5 下一页 共5页