| Class | String |
| In: |
lib/taskjuggler/UTF8String.rb
|
| Parent: | Object |
This is an extension and modification of the standard String class. We do a lot of UTF-8 character processing in the parser. Ruby 1.8 does not have good enough UTF-8 support and Ruby 1.9 only handles UTF-8 characters as Strings. This is very inefficient compared to representing them as Fixnum objects. Some of these hacks can be removed once we have switched to 1.9 support only.
| << | -> | old_double_left_angle |
| reverse | -> | old_reverse |
| each_char | -> | each_utf8_char |
| length | -> | length_utf8 |
Replacement for the existing << operator that also works for characters above Fixnum 255 (UTF-8 characters).
# File lib/taskjuggler/UTF8String.rb, line 62 def << (obj) if obj.is_a?(String) || (obj < 256) # In this case we can use the built-in concat. concat(obj) else # UTF-8 characters have a maximum length of 4 byte and no byte is 0. mask = 0xFF000000 pos = 3 while pos >= 0 # Use the built-in concat operator for each byte. concat((obj & mask) >> (8 * pos)) if (obj & mask) != 0 # Move mask and position to the next byte. mask = mask >> 8 pos -= 1 end end end
Iterate over the String calling the block for each UTF-8 character in the String. This implementation looks more awkward but is noticeably faster than the often propagated regexp based implementations.
# File lib/taskjuggler/UTF8String.rb, line 31 def each_utf8_char c = '' length = 0 each_byte do |b| c << b if length > 0 # subsequent unicode byte if (length -= 1) == 0 # end of unicode character reached yield c c = '' end elsif (b & 0xC0) == 0xC0 # first unicode byte length = -1 while (b & 0x80) != 0 length += 1 b = b << 1 end else # ASCII character yield c c = '' end end end
Ensure the String is really UTF-8 encoded and newlines are only \n. If that‘s not possible, an Encoding::UndefinedConversionError is raised.
# File lib/taskjuggler/UTF8String.rb, line 117 def forceUTF8Encoding if RUBY_VERSION < '1.9.0' # Ruby 1.8 really only support 7 bit ASCII well. Only do the line-end # clean-up. gsub(/\r\n/, "\n") else begin # Ensure that the text has LF line ends and is UTF-8 encoded. encode('UTF-8', :universal_newline => true) rescue # The encoding of the String is broken. Find the first broken line and # report it. lineCtr = 1 each_line do |line| begin line.encode('UTF-8') rescue line = line.encode('UTF-8', :invalid => :replace, :undef => :replace, :replace => '<?>') raise Encoding::UndefinedConversionError, "UTF-8 encoding error in line #{lineCtr}: #{line}" end lineCtr += 1 end end end end
# File lib/taskjuggler/UTF8String.rb, line 103 def to_quoted_printable [self].pack('M').gsub(/\n/, "\r\n") end