hive UDF序列化异常问题记录

一、背景

我需要使用udf将hive查询结果中的每一行写出到redis集群里面。当加上limit时,一切正常。

maven-shade-plugin打完包后,进行清理:zip -d target/validater-1.0-SNAPSHOT.jar META-INF/*.RSA META-INF/*.DSA META-INF/*.SF  
maven-assembly-plugin打包不用清理

进入hive:
hive>add jar {localPath}/test.jar;  
hive>CREATE TEMPORARY FUNCTION mytest AS '{path}.udf.HiveToRedisUDF';  
hive>select mytest(field1, field2) from {hiveTableName} where dt >= sysdate(-1) limit 10000;  

可是我如果去掉limit,就会报异常(见本文最后)

另外如果我不引入jedis的依赖,也不会换错。

所以问题有两个:

  • 为什么会出现序列化失败的问题
  • 为什么hive sql中不写limit约束就会出问题,写了就不出问题

二、处理

是否包冲突?

初步判断是有包冲突,于是查看GenericKeyedObjectPool这个类,在commons-pool2有两个版本2.4.2和2.4.1,以及在commons-pool1.5.1. 使用dependencyManager去掉了版本冲突,再用shade-plugin对名字进行了修改,结果还是一样有问题

反射和冲突无关?

    at org.apache.hive.com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:32)
这一步可以看到在这里反射失败了

三、解决及原因

问题1:序列化问题的原因

找了很多的资料以及查看源码,最后先加了一个设置先跑通了

set hive.plan.serialization.format=javaXML;
关于参数hive.plan.serialization.format,下面是说明,可以是kryo或者javaXML的,估计还是kryo序列化redis客户端的依赖包有问题,而走javaXML来序列化就没有问题。

<property>  
    <name>hive.plan.serialization.format</name>
    <value>kryo</value>
    <description>
      Query plan format serialization between client and task nodes. 
      Two supported values are : kryo and javaXML. Kryo is default.
    </description>
  </property>

而kryo序列化失败的问题目前还没有最终定位到,而kryo也算是前科累累,在多种情况下为了效率而在经常还是会出现序列化或者反序列化失败的问题。 后续看时间再来定位kryo序列化jedis(公司修改后的)的问题吧。自己写了一个简单的例子并没有发现问题。

问题2:加limit与不加limit的区别

目前看上去是加了limit只会在一台机器上执行, 没有在node之间序列化后进行传递。这个可通过本地模式进行验证,或者也可以从日志中,看到并没有mapReduce job的生成。从而不会出现这个问题。

备注:异常记录

Serialization trace:  
internalPool (cli.driver.pool.ConnectionPool)  
shardPool (cli.driver.jedis.JedisConnectionFactory)  
connectionFactory (cli.Cluster)  
redisClient (validater.udf.JimdbWriterSingle)  
redisWriter (validater.udf.HiveToJimdbUDF)  
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)  
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)  
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)  
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)  
    at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:82)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:474)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:538)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:474)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:538)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:474)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:538)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:474)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:538)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:474)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:538)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:474)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:614)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:91)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:17)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:538)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:474)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:614)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:78)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:18)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:538)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:474)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:614)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:91)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:17)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:538)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:474)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:520)
    at org.apache.hadoop.hive.ql.exec.Utilities.serializeObjectByKryo(Utilities.java:1063)
    at org.apache.hadoop.hive.ql.exec.Utilities.serializePlan(Utilities.java:950)
    at org.apache.hadoop.hive.ql.exec.Utilities.serializePlan(Utilities.java:962)
    at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:722)
    ... 23 more
Caused by: java.lang.IllegalArgumentException: Unable to create serializer "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for class: org.shadecommon.apache.commons.pool2.impl.GenericKeyedObjectPool  
    at org.apache.hive.com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:45)
    at org.apache.hive.com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:26)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.newDefaultSerializer(Kryo.java:343)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.getDefaultSerializer(Kryo.java:336)
    at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.registerImplicit(DefaultClassResolver.java:56)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:476)
    at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:503)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:57)
    ... 62 more
Caused by: java.lang.reflect.InvocationTargetException  
    at sun.reflect.GeneratedConstructorAccessor10.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at org.apache.hive.com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:32)
    ... 70 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0  
    at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializerGenericsUtil.newCachedFieldOfGenericType(FieldSerializerGenericsUtil.java:199)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.newCachedField(FieldSerializer.java:279)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.createCachedFields(FieldSerializer.java:243)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.rebuildCachedFields(FieldSerializer.java:188)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.<init>(FieldSerializer.java:109)
    ... 74 more
comments powered by Disqus