You are using Hive and some of your tables are backed by the Avro Serde. You are seeing a lot of this:
WARN avro.AvroSerdeUtils: Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither avro.schema.literal nor avro.schema.url specified, can't determine table schema
Sometimes this WARN becomes an ERROR and your queries completely fail! This is driving you mad as you have in fact specified the schema, and this is obvious because sometimes the queries work just fine, despite the warning. If this is you, read on!
The problem is that you’re defining the tables the wrong way, even if it’s what you have seen in every example on the internet. You’re doing it like this:
CREATE EXTERNAL TABLE mytable ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='hdfs:///user/cloudera/mytable.avsc');
Aparently, this is no longer correct as of Hive 0.11. You now have to use SERDEPROPERTIES instead of TBLPROPERTIES, like this:
CREATE EXTERNAL TABLE mytable ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'avro.schema.url'='hdfs:///user/cloudera/mytable.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
That’s it! Somehow, it was incredibly hard for me to find this information online, and I have actually stumbled upon it by chance here: http://technicaltidbit.blogspot.ro/2013/02/avro-tips.html . Give it a read for some quick AVRO tips!