Avro Backed Hive Table in CDH5 (or CDH4.5): Encountered AvroSerdeException determining schema

You are using Hive and some of your tables are backed by the Avro Serde. You are seeing a lot of this:

WARN avro.AvroSerdeUtils: Encountered AvroSerdeException determining schema. 
Returning signal schema to indicate problem org.apache.hadoop.hive.serde2.avro.AvroSerdeException: 
Neither avro.schema.literal nor avro.schema.url specified, can't determine table schema

Sometimes this WARN becomes an ERROR and your queries completely fail! This is driving you mad as you have in fact specified the schema, and this is obvious because sometimes the queries work just fine, despite the warning. If this is you, read on!

The problem is that you’re defining the tables the wrong way, even if it’s what you have seen in every example on the internet. You’re doing it like this:

CREATE EXTERNAL TABLE mytable
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
    'avro.schema.url'='hdfs:///user/cloudera/mytable.avsc');

Aparently, this is no longer correct as of Hive 0.11. You now have to use SERDEPROPERTIES instead of TBLPROPERTIES, like this:

CREATE EXTERNAL TABLE mytable
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (
  'avro.schema.url'='hdfs:///user/cloudera/mytable.avsc')     
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';

That’s it! Somehow, it was incredibly hard for me to find this information online, and I have actually stumbled upon it by chance here: http://technicaltidbit.blogspot.ro/2013/02/avro-tips.html . Give it a read for some quick AVRO tips!

2 thoughts on “Avro Backed Hive Table in CDH5 (or CDH4.5): Encountered AvroSerdeException determining schema

  1. Can you fix the following line in your example?

    ‘avro.schema.url’=’hdfs:///user/cloudera/mytable.avscORED as INPUTFORMAT

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s