点击注册
点击注册
.

Apachekylin进阶——配置篇

发布日期:2022-03-18 20:52    点击次数:173
在Apache kylin的日常运维中,通常根据日常运行产生的日志调整相关配置参数,从而达到性能的提升和运行的稳定性,kylin官网并没有给出这些配置的相关说明和解释,下面介绍一下kylin的配置。在${KYLIN_HOME}/conf 下一共4个配置文件:kylin_hive_conf.xmlkylin_job_conf_inmem.xmlkylin_job_conf.xmlkylin.properties kylin_hive_conf.xml是kylin提交任务到hive的配置文件,kylin_job_conf_inmem.xml、kylin_job_conf.xml是 kylin提交任务到yarn中的配置文件,用户可根据自己的情况酌情修改,下面介绍一下kylin.properties的重要配置项: kylin.server.mode=all kylin服务器的运行模式,有all、job、query,涵义参见:https://zhuanlan.zhihu.com/p/22219602?refer=dataeyekylin.rest.servers=hostname1:7070,hostname2:7070,hostname3:7070 kylin实例服务器列表,注意:不包括以job模式运行的服务器实例!kylin.metadata.url=kylin_metadata@hbase kylin元数据配置,涵义参见:https://zhuanlan.zhihu.com/p/22223631?refer=dataeyekylin.job.retry=0 kylin job的重试次数,注意:这个job指cube build、fresh时生成的job,而不是每一个step 的mapreduce job。kylin.job.mapreduce.default.reduce.input.mb=500 kylin提交作业到hadoop中时,每个reduce的最大输入,该参数用来确定mapreduce的reduce个数,参见以下代码:public double getDefaultHadoopJobReducerInputMB() { return Double.parseDouble(getOptional("kylin.job.mapreduce.default.reduce.input.mb", "500"));}protected void setReduceTaskNum(Job job, KylinConfig config, String cubeName, int level) throws ClassNotFoundException, IOException, InterruptedException, JobException { Configuration jobConf = job.getConfiguration(); KylinConfig kylinConfig = KylinConfig.getInstanceFromEnv(); CubeDesc cubeDesc = CubeManager.getInstance(config).getCube(cubeName).getDescriptor(); kylinConfig = cubeDesc.getConfig(); double perReduceInputMB = kylinConfig.getDefaultHadoopJobReducerInputMB(); double reduceCountRatio = kylinConfig.getDefaultHadoopJobReducerCountRatio(); // total map input MB double totalMapInputMB = this.getTotalMapInputMB(); // output / input ratio int preLevelCuboids, thisLevelCuboids; if (level == 0) { // base cuboid preLevelCuboids = thisLevelCuboids = 1; } else { // n-cuboid int[] allLevelCount = CuboidCLI.calculateAllLevelCount(cubeDesc); preLevelCuboids = allLevelCount[level - 1]; thisLevelCuboids = allLevelCount[level]; } // total reduce input MB double totalReduceInputMB = totalMapInputMB * thisLevelCuboids / preLevelCuboids; // number of reduce tasks int numReduceTasks = (int) Math.round(totalReduceInputMB / perReduceInputMB * reduceCountRatio); // adjust reducer number for cube which has DISTINCT_COUNT measures for better performance if (cubeDesc.hasMemoryHungryMeasures()) {numReduceTasks = numReduceTasks * 4; } // at least 1 reducer numReduceTasks = Math.max(1, numReduceTasks); // no more than 5000 reducer by default numReduceTasks = Math.min(kylinConfig.getHadoopJobMaxReducerNumber(), numReduceTasks); jobConf.setInt(MAPRED_REDUCE_TASKS, numReduceTasks); logger.info("Having total map input MB " + Math.round(totalMapInputMB)); logger.info("Having level " + level + ", pre-level cuboids " + preLevelCuboids + ", this level cuboids " + thisLevelCuboids); logger.info("Having per reduce MB " + perReduceInputMB + ", reduce count ratio " + reduceCountRatio); logger.info("Setting " + MAPRED_REDUCE_TASKS + "=" + numReduceTasks);}用户可根据自己的数据量大小,性能要求及hadoop集群中的mapred-site.xml配置,酌情修改该项。kylin.job.run.as.remote.cmd=false 该项配置表示,是否以ssh命令方式,向hadoop、hbase、hive等发起CLI命令。一般将kylin部署在hadoop集群的客户机上,所以该值为false。假如kylin服务不部署在hadoop的客户机上,则该值为true;这样kylin访问hadoop集群,需要给出以下配置项的值:# Only necessary when kylin.job.run.as.remote.cmd=true kylin.job.remote.cli.hostname= # Only necessary when kylin.job.run.as.remote.cmd=true kylin.job.remote.cli.username= # Only necessary when kylin.job.run.as.remote.cmd=true kylin.job.remote.cli.password= ---------------------------------------------分割线---------------------------------------------------------------------以下配置项是kylin并发执行job的最大值:kylin.job.concurrent.max.limit=10kylin检查提交yarn中的mapreduce任务状态的时间间隔:kylin.job.yarn.app.rest.check.interval.seconds=10代码如下:while (!isDiscarded()) { JobStepStatusEnum newStatus = statusChecker.checkStatus(); if (status == JobStepStatusEnum.KILLED) { executableManager.updateJobOutput(getId(), ExecutableState.ERROR, Collections.<String, String> emptyMap(), "killed by admin"); return new ExecuteResult(ExecuteResult.State.FAILED, "killed by admin"); } if (status == JobStepStatusEnum.WAITING && (newStatus == JobStepStatusEnum.FINISHED || newStatus == JobStepStatusEnum.ERROR || newStatus == JobStepStatusEnum.RUNNING)) { final long waitTime = System.currentTimeMillis() - getStartTime(); setMapReduceWaitTime(waitTime); } status = newStatus; executableManager.addJobInfo(getId(), hadoopCmdOutput.getInfo()); if (status.isComplete()) { final Map<String, String> info = hadoopCmdOutput.getInfo(); readCounters(hadoopCmdOutput, info); executableManager.addJobInfo(getId(), info); if (status == JobStepStatusEnum.FINISHED) { return new ExecuteResult(ExecuteResult.State.SUCCEED, output.toString()); } else { return new ExecuteResult(ExecuteResult.State.FAILED, output.toString()); } } Thread.sleep(context.getConfig().getYarnStatusCheckIntervalSeconds() * 1000); } 以下配置项是kylin build cube时的第一步建立hive中间表所在的数据库:kylin.job.hive.database.for.intermediatetable=default以下是kylin build cube时在hbase中建表后,存储数据的压缩算法:kylin.hbase.default.compression.codec=snappy注意,设值时,先要检验hbase所指向的hadoop支不支持该压缩算法,检验命令如下:hadoop checknative -a检验结果如下:该hadoop集群不支持snappy压缩算法,所以需修改默认值。

读过心理学相关书籍很多 擅长分析。向TA提问a有打麻将的嗜好的人是没有记性的,作为女朋友应该告诉男朋友,这是赌博,会倾家荡产的,你如果继续交往下去,那就必须有准备倾家荡产的。已赞过已踩过评论匿名用户2018-05-20

有和俺一起比赛的童鞋应该知道~~ 所以俺最后只拿了30名..(呵呵呵,其实俺比赛的时候很紧张,童鞋们打的又很快,俺有点适应不了有木有真人棋牌游戏,第一次嘛~~这样的成绩不错了是吧?)




栏目分类
相关棋牌游戏技巧