Wednesday 11 September 2013

TeraValidate contains "checksum" output

TeraValidate contains "checksum" output

I'm running TeraGen + TeraSort + TeraValidate jobs on Hadoop 1.x and
Hadoop 2.x
On TeraVal's API it says ANY output generated from the run suggests an
error occurred.
That was true on Hadoop 1.x: TeraVal created two files:
_SUCCESS (job ended successfully - even if the validation itself failed)
r-part-00000 (of size 0 if validation succeeded, otherwise its bigger than 0)
So naturally my automation scripts looked for this "r-part" file of size 0.
But now on Hadoop 2.x TeraValidate generates output also on success.
This "r-part" file contains around 24 bytes of data, which is basically
one line saying "checksum 2387cf87987sd"
What does this output mean, and does it suggest failure of the validation?

No comments:

Post a Comment