Output utf8 string to STDERR when using Data::Dumper or Smart::Comments


Perl’s Data::Dumper and Smart::Comments are very useful for developing. But when process non ASCII data, even if you are processing them with utf8 pragma and having specified the encoding of STDERR, these modules output the character’s unicode (hexadecimal number) in the messages instead of the message itself. This problem will be solved by using $SIG{__WARN__} hook.

Problem when processing UTF-8 strings

When process non ASCII data, in my case Japanese, if I specify the encoding of STDERR, it works for MY error messages in the main program. But the output of Smart::Comments and Data::Dumper remains expressed as hex numbers.

Output from Data::Dumper

use strict;
use warnings;
use utf8;
use Data::Dumper;

binmode STDERR, ':encoding(utf8)'; #It works for error messages specified in the program.

warn "日本語エラーメッセージ。\n";

my @arr = qw(
	こんな値や
	あんな値
);
warn Dumper(\@arr);

Output

日本語エラーメッセージ。
$VAR1 = [
          "\x{3053}\x{3093}\x{306a}\x{5024}\x{3084}",
          "\x{3042}\x{3093}\x{306a}\x{5024}"
        ];

How to convert the output of Data::Dumper to utf8 string

It seems that the $SIG{__WARN__} handler can be useful to solve it.

use strict;
use warnings;
use utf8;
use Data::Dumper;

binmode STDERR, ':encoding(utf8)';

# Convert outputs from debugging modules to characters
local $SIG{__WARN__} = sub {
	warn join("",
		map {
			my $str = $_;
			$str =~ s/\\x\{(\w{4})\}/pack('U', hex($1))/eg; # \x{abcd} -> letter
			$str;
		} @_
	);
};

my @arr = qw(
	こんな値や
	あんな値
);
warn Dumper(\@arr);

Output

$VAR1 = [
          "こんな値や",
          "あんな値"
        ];

Same for Smart::Comments, since this module uses Data::Dumper inside.