Add option for automatic subtitle character encoding normalization (#68)
* Add option for automatic subtitle character encoding normalization The rationale behind this function is that some services use ISO-8859-1 (latin1) or Windows-1252 (CP-1252) instead of UTF-8 encoding, whether intentionally or accidentally. Some services even stream subtitles with malformed/mixed encoding (each segment has a different encoding). * Remove Subtitle parameter `auto_fix_encoding` Just always attempt to fix encoding. If the subtitle is neither UTF-8 nor CP-1252, then it should realistically error out instead of producing garbage Subtitle data anyway. * Move Subtitle encoding fixing code out of if drm tree * Use chardet as a last ditch effort fixing Subs, or return original data * Move Subtitle.fix_encoding method to utilities as try_ensure_utf8 * Add Shivelight as a contributor --------- Co-authored-by: rlaphoenix <rlaphoenix@pm.me>
This commit is contained in:
@@ -60,6 +60,7 @@ sortedcontainers = "^2.4.0"
|
||||
subtitle-filter = "^1.4.6"
|
||||
Unidecode = "^1.3.6"
|
||||
urllib3 = "^2.0.4"
|
||||
chardet = "^5.2.0"
|
||||
|
||||
[tool.poetry.dev-dependencies]
|
||||
pre-commit = "^3.4.0"
|
||||
|
||||
Reference in New Issue
Block a user