Coverage for torrentfile\torrent.py: 100%
282 statements
« prev ^ index » next coverage.py v7.3.0, created at 2023-08-27 21:50 -0700
« prev ^ index » next coverage.py v7.3.0, created at 2023-08-27 21:50 -0700
1#! /usr/bin/python3
2# -*- coding: utf-8 -*-
4##############################################################################
5# Copyright (C) 2021-current alexpdev
6#
7# Licensed under the Apache License, Version 2.0 (the "License");
8# you may not use this file except in compliance with the License.
9# You may obtain a copy of the License at
10#
11# http://www.apache.org/licenses/LICENSE-2.0
12#
13# Unless required by applicable law or agreed to in writing, software
14# distributed under the License is distributed on an "AS IS" BASIS,
15# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16# See the License for the specific language governing permissions and
17# limitations under the License.
18##############################################################################
19"""
20Classes and procedures pertaining to the creation of torrent meta files.
22Classes
23-------
25- `TorrentFile`
26 construct .torrent file.
28- `TorrentFileV2`
29 construct .torrent v2 files using provided data.
31- `MetaFile`
32 base class for all MetaFile classes.
34Constants
35---------
37- BLOCK_SIZE : int
38 size of leaf hashes for merkle tree.
40- HASH_SIZE : int
41 Length of a sha256 hash.
43Bittorrent V2
44-------------
46**From Bittorrent.org Documentation pages.**
48*Implementation details for Bittorrent Protocol v2.*
50!!!Note
51 All strings in a .torrent file that contain text must be UTF-8 encoded.
53### Meta Version 2 Dictionary:
55- "announce":
56 The URL of the tracker.
58- "info":
59 This maps to a dictionary, with keys described below.
61 - "name":
62 A display name for the torrent. It is purely advisory.
64 - "piece length":
65 The number of bytes that each logical piece in the peer
66 protocol refers to. I.e. it sets the granularity of piece, request,
67 bitfield and have messages. It must be a power of two and at least
68 6KiB.
70 - "meta version":
71 An integer value, set to 2 to indicate compatibility
72 with the current revision of this specification. Version 1 is not
73 assigned to avoid confusion with BEP3. Future revisions will only
74 increment this issue to indicate an incompatible change has been made,
75 for example that hash algorithms were changed due to newly discovered
76 vulnerabilities. Lementations must check this field first and indicate
77 that a torrent is of a newer version than they can handle before
78 performing other idations which may result in more general messages
79 about invalid files. Files are mapped into this piece address space so
80 that each non-empty
82 - "file tree":
83 A tree of dictionaries where dictionary keys represent UTF-8
84 encoded path elements. Entries with zero-length keys describe the
85 properties of the composed path at that point. 'UTF-8 encoded'
86 context only means that if the native encoding is known at creation
87 time it must be converted to UTF-8. Keys may contain invalid UTF-8
88 sequences or characters and names that are reserved on specific
89 filesystems. Implementations must be prepared to sanitize them. On
90 platforms path components exactly matching '.' and '..' must be
91 sanitized since they could lead to directory traversal attacks and
92 conflicting path descriptions. On platforms that require UTF-8
93 path components this sanitizing step must happen after normalizing
94 overlong UTF-8 encodings.
95 File is aligned to a piece boundary and occurs in same order as
96 the file tree. The last piece of each file may be shorter than the
97 specified piece length, resulting in an alignment gap.
99 - "length":
100 Length of the file in bytes. Presence of this field indicates
101 that the dictionary describes a file, not a directory. Which means
102 it must not have any sibling entries.
104 - "pieces root":
105 For non-empty files this is the the root hash of a merkle
106 tree with a branching factor of 2, constructed from 16KiB blocks
107 of the file. The last block may be shorter than 16KiB. The
108 remaining leaf hashes beyond the end of the file required to
109 construct upper layers of the merkle tree are set to zero. As of
110 meta version 2 SHA2-256 is used as digest function for the merkle
111 tree. The hash is stored in its binary form, not as human-readable
112 string.
114- "piece layers":
115 A dictionary of strings. For each file in the file tree that
116 is larger than the piece size it contains one string value.
117 The keys are the merkle roots while the values consist of concatenated
118 hashes of one layer within that merkle tree. The layer is chosen so
119 that one hash covers piece length bytes. For example if the piece
120 size is 16KiB then the leaf hashes are used. If a piece size of
121 128KiB is used then 3rd layer up from the leaf hashes is used. Layer
122 hashes which exclusively cover data beyond the end of file, i.e.
123 are only needed to balance the tree, are omitted. All hashes are
124 stored in their binary format. A torrent is not valid if this field is
125 absent, the contained hashes do not match the merkle roots or are
126 not from the correct layer.
128!!!important
129 The file tree root dictionary itself must not be a file,
130 i.e. it must not contain a zero-length key with a dictionary containing
131 a length key.
133Bittorrent V1
134-------------
136### v1 meta-dictionary
138- announce:
139 The URL of the tracker.
141- info:
142 This maps to a dictionary, with keys described below.
144 - `name`:
145 maps to a UTF-8 encoded string which is the suggested name to
146 save the file (or directory) as. It is purely advisory.
148 - `piece length`:
149 maps to the number of bytes in each piece the file is split
150 into. For the purposes of transfer, files are split into
151 fixed-size pieces which are all the same length except for
152 possibly the last one which may be truncated.
154 - `piece length`:
155 is almost always a power of two, most commonly 2^18 = 256 K
157 - `pieces`:
158 maps to a string whose length is a multiple of 20. It is to be
159 subdivided into strings of length 20, each of which is the SHA1
160 hash of the piece at the corresponding index.
162 - `length`:
163 In the single file case, maps to the length of the file in bytes.
165 - `files`:
166 If present then the download represents a single file, otherwise it
167 represents a set of files which go in a directory structure.
168 For the purposes of the other keys, the multi-file case is treated
169 as only having a single file by concatenating the files in the order
170 they appear in the files list. The files list is the value `files`
171 maps to, and is a list of dictionaries containing the following keys:
173 - `path`:
174 A list of UTF-8 encoded strings corresponding to subdirectory
175 names, the last of which is the actual file name
177 - `length`:
178 Maps to the length of the file in bytes.
180 - `length`:
181 Only present if the content is a single file. Maps to the length
182 of the file in bytes.
184!!!Note
185 In the single file case, the name key is the name of a file,
186 in the muliple file case, it's the name of a directory.
187"""
189import os
190import logging
191from collections.abc import Sequence
192from datetime import datetime
194import pyben
196from torrentfile import utils
197from torrentfile.hasher import FileHasher, Hasher, HasherHybrid, HasherV2
198from torrentfile.mixins import ProgMixin
199from torrentfile.version import __version__ as version
201logger = logging.getLogger(__name__)
204class MetaFile:
205 """
206 Base Class for all TorrentFile classes.
208 Parameters
209 ----------
210 path : str
211 target path to torrent content. Default: None
212 announce : str
213 One or more tracker URL's. Default: None
214 comment : str
215 A comment. Default: None
216 piece_length : int
217 Size of torrent pieces. Default: None
218 private : bool
219 For private trackers. Default: None
220 outfile : str
221 target path to write .torrent file. Default: None
222 source : str
223 Private tracker source. Default: None
224 progress : str
225 level of progress bar displayed Default: "1"
226 cwd : bool
227 If True change default save location to current directory
228 httpseeds : list
229 one or more web addresses where torrent content can be found.
230 url_list : list
231 one or more web addressess where torrent content exists.
232 content : str
233 alias for 'path' arg.
234 meta_version : int
235 indicates which Bittorrent protocol to use for hashing content
236 """
238 hasher = None
240 @classmethod
241 def set_callback(cls, func):
242 """
243 Assign a callback function for the Hashing class to call for each hash.
245 Parameters
246 ----------
247 func : function
248 The callback function which accepts a single paramter.
249 """
250 if "hasher" in vars(cls) and vars(cls)["hasher"]:
251 cls.hasher.set_callback(func)
253 def __init__(
254 self,
255 path=None,
256 announce=None,
257 comment=None,
258 align=False,
259 piece_length=None,
260 private=False,
261 outfile=None,
262 source=None,
263 progress=1,
264 cwd=False,
265 httpseeds=None,
266 url_list=None,
267 content=None,
268 meta_version=None,
269 **_,
270 ):
271 """
272 Construct MetaFile superclass and assign local attributes.
273 """
274 self.private = private
275 self.cwd = cwd
276 self.outfile = outfile
277 self.progress = int(progress)
278 self.comment = comment
279 self.source = source
280 self.meta_version = meta_version
282 if content:
283 path = content
284 if not path:
285 if announce and len(announce) > 1 and os.path.exists(announce[-1]):
286 path = announce[-1]
287 announce = announce[:-1]
288 elif url_list and os.path.exists(url_list[-1]):
289 path = url_list[-1]
290 url_list = url_list[:-1]
291 elif httpseeds and os.path.exists(httpseeds[-1]):
292 path = httpseeds[-1]
293 httpseeds = httpseeds[:-1]
294 else:
295 raise utils.MissingPathError("Path to content is required.")
297 # base path to torrent content.
298 self.path = path
300 logger.debug("path parameter found %s", path)
302 self.meta = {
303 "created by": f"torrentfile_v{version}",
304 "creation date": int(datetime.timestamp(datetime.now())),
305 "info": {},
306 }
308 # Format piece_length attribute.
309 if piece_length:
310 self.piece_length = utils.normalize_piece_length(piece_length)
311 logger.debug("piece length parameter found %s", piece_length)
312 else:
313 self.piece_length = utils.path_piece_length(self.path)
314 logger.debug("piece length calculated %s", self.piece_length)
316 # Assign announce URL to empty string if none provided.
317 if not announce:
318 self.announce, self.announce_list = "", [[""]]
320 # Most torrent clients have editting trackers as a feature.
321 elif isinstance(announce, str):
322 self.announce, self.announce_list = announce, [[announce]]
324 elif isinstance(announce, Sequence):
325 self.announce, self.announce_list = announce[0], [announce]
327 self.align = align
329 if self.announce:
330 self.meta["announce"] = self.announce
331 self.meta["announce-list"] = self.announce_list
332 if comment:
333 self.meta["info"]["comment"] = comment
334 logger.debug("comment parameter found %s", comment)
335 if private:
336 self.meta["info"]["private"] = 1
337 logger.debug("private parameter triggered")
338 if source:
339 self.meta["info"]["source"] = source
340 logger.debug("source parameter found %s", source)
341 if url_list:
342 self.meta["url-list"] = url_list
343 logger.debug("url list parameter found %s", str(url_list))
344 if httpseeds:
345 self.meta["httpseeds"] = httpseeds
346 logger.debug("httpseeds parameter found %s", str(httpseeds))
347 self.meta["info"]["piece length"] = self.piece_length
349 self.meta_version = meta_version
350 parent, self.name = os.path.split(self.path)
351 if not self.name:
352 self.name = os.path.basename(parent)
353 self.meta["info"]["name"] = self.name
355 def assemble(self):
356 """
357 Overload in subclasses.
359 Raises
360 ------
361 Exception
362 NotImplementedError
363 """
364 raise NotImplementedError
366 def sort_meta(self):
367 """Sort the info and meta dictionaries."""
368 logger.debug("sorting dictionary keys")
369 meta = self.meta
370 meta["info"] = dict(sorted(list(meta["info"].items())))
371 meta = dict(sorted(list(meta.items())))
372 return meta
374 def write(self, outfile=None) -> tuple:
375 """
376 Write meta information to .torrent file.
378 Final step in the torrent file creation process.
379 After hashing and sorting every piece of content
380 write the contents to file using the bencode encoding.
382 Parameters
383 ----------
384 outfile : str
385 Destination path for .torrent file. default=None
387 Returns
388 -------
389 outfile : str
390 Where the .torrent file was writen.
391 meta : dict
392 .torrent meta information.
393 """
394 if outfile:
395 self.outfile = outfile
396 if not self.outfile: # pragma: nocover
397 path = os.path.join(os.getcwd(), self.name) + ".torrent"
398 self.outfile = path
399 if str(self.outfile)[-1] in "\\/":
400 self.outfile = self.outfile + (self.name + ".torrent")
401 self.meta = self.sort_meta()
402 try:
403 pyben.dump(self.meta, self.outfile)
404 except PermissionError as excp:
405 logger.error("Permission Denied: Could not write to %s",
406 self.outfile)
407 raise PermissionError from excp
408 return self.outfile, self.meta
411class TorrentFile(MetaFile, ProgMixin):
412 """
413 Class for creating Bittorrent meta files.
415 Construct *Torrentfile* class instance object.
417 Parameters
418 ----------
419 **kwargs : dict
420 Dictionary containing torrent file options.
421 """
423 hasher = Hasher
425 def __init__(self, **kwargs):
426 """
427 Construct TorrentFile instance with given keyword args.
429 Parameters
430 ----------
431 **kwargs : dict
432 dictionary of keyword args passed to superclass.
433 """
434 super().__init__(**kwargs)
435 logger.debug("Assembling bittorrent v1 torrent file")
436 self.assemble()
438 def assemble(self):
439 """
440 Assemble components of torrent metafile.
442 Returns
443 -------
444 dict
445 metadata dictionary for torrent file
446 """
447 info = self.meta["info"]
448 size, filelist = utils.filelist_total(self.path)
449 kws = {
450 "progress": self.progress,
451 "progress_bar": None,
452 "align": self.align,
453 }
455 if self.progress == 2:
456 self.prog_bar = self.get_progress_tracker(size, str(self.path))
457 kws["progress_bar"] = self.prog_bar
459 elif self.progress == 0:
460 self.prog_bar = self.get_progress_tracker(-1, "")
461 kws["progress_bar"] = self.prog_bar
463 if os.path.isfile(self.path):
464 info["length"] = size
465 elif not self.align:
466 info["files"] = [{
467 "length":
468 os.path.getsize(path),
469 "path":
470 os.path.relpath(path, self.path).split(os.sep),
471 } for path in filelist]
472 else:
473 info["files"] = []
474 for path in filelist:
475 filesize = os.path.getsize(path)
476 info["files"].append({
477 "length":
478 filesize,
479 "path":
480 os.path.relpath(path, self.path).split(os.sep),
481 })
482 if filesize < self.piece_length:
483 remainder = self.piece_length - filesize
484 else:
485 remainder = filesize % self.piece_length
486 if remainder:
487 info["files"].append({
488 "attr": "p",
489 "length": remainder,
490 "path": [".pad", str(remainder)],
491 })
492 pieces = bytearray()
493 feeder = Hasher(filelist, self.piece_length, **kws)
494 for piece in feeder:
495 pieces.extend(piece)
496 info["pieces"] = pieces
499class TorrentFileV2(MetaFile, ProgMixin):
500 """
501 Class for creating Bittorrent meta v2 files.
503 Parameters
504 ----------
505 **kwargs : dict
506 Keyword arguments for torrent file options.
507 """
509 hasher = HasherV2
511 def __init__(self, **kwargs):
512 """
513 Construct `TorrentFileV2` Class instance from given parameters.
515 Parameters
516 ----------
517 **kwargs : dict
518 keywword arguments to pass to superclass.
519 """
520 super().__init__(**kwargs)
521 logger.debug("Assembling bittorrent v2 torrent file")
522 self.piece_layers = {}
523 self.hashes = []
524 size, file_list = utils.filelist_total(self.path)
525 self.kws = {"progress": self.progress, "progress_bar": None}
526 self.total = len(file_list)
528 if self.progress == 2:
529 self.prog_bar = self.get_progress_tracker(size, str(self.path))
530 self.kws["progress_bar"] = self.prog_bar
532 elif self.progress == 0:
533 self.prog_bar = self.get_progress_tracker(-1, "")
534 self.kws["progress_bar"] = self.prog_bar
536 self.assemble()
538 def assemble(self):
539 """
540 Assemble then return the meta dictionary for encoding.
542 Returns
543 -------
544 meta : dict
545 Metainformation about the torrent.
546 """
547 info = self.meta["info"]
548 if os.path.isfile(self.path):
549 info["file tree"] = {info["name"]: self._traverse(self.path)}
550 info["length"] = os.path.getsize(self.path)
551 else:
552 info["file tree"] = self._traverse(self.path)
554 info["meta version"] = 2
555 self.meta["piece layers"] = self.piece_layers
557 def _traverse(self, path: str) -> dict:
558 """
559 Walk directory tree.
561 Parameters
562 ----------
563 path : str
564 Path to file or directory.
565 """
566 if os.path.isfile(path):
567 # Calculate Size and hashes for each file.
568 size = os.path.getsize(path)
570 if size == 0:
571 return {"": {"length": size}}
573 logger.debug("Hashing %s", str(path))
574 fhash = HasherV2(path, self.piece_length, **self.kws)
576 if size > self.piece_length:
577 self.piece_layers[fhash.root] = fhash.piece_layer
578 return {"": {"length": size, "pieces root": fhash.root}}
580 file_tree = {}
581 if os.path.isdir(path):
582 for name in sorted(os.listdir(path)):
583 file_tree[name] = self._traverse(os.path.join(path, name))
584 return file_tree
587class TorrentFileHybrid(MetaFile, ProgMixin):
588 """
589 Construct the Hybrid torrent meta file with provided parameters.
591 Parameters
592 ----------
593 **kwargs : dict
594 Keyword arguments for torrent options.
595 """
597 hasher = HasherHybrid
599 def __init__(self, **kwargs):
600 """
601 Create Bittorrent v1 v2 hybrid metafiles.
602 """
603 super().__init__(**kwargs)
604 logger.debug("Assembling bittorrent Hybrid file")
605 self.name = os.path.basename(self.path)
606 self.hashes = []
607 self.piece_layers = {}
608 self.pieces = []
609 self.files = []
610 size, file_list = utils.filelist_total(self.path)
611 self.kws = {"progress": self.progress, "progress_bar": None}
612 self.total = len(file_list)
614 if self.progress == 0:
615 self.prog_bar = self.get_progress_tracker(-1, "")
616 self.kws["progress_bar"] = self.prog_bar
618 elif self.progress == 2:
619 self.prog_bar = self.get_progress_tracker(size, str(self.path))
620 self.kws["progress_bar"] = self.prog_bar
622 self.assemble()
624 def assemble(self):
625 """
626 Assemble the parts of the torrentfile into meta dictionary.
627 """
628 info = self.meta["info"]
629 info["meta version"] = 2
631 if os.path.isfile(self.path):
632 info["file tree"] = {self.name: self._traverse(self.path)}
633 info["length"] = os.path.getsize(self.path)
635 else:
636 info["file tree"] = self._traverse(self.path)
637 info["files"] = self.files
639 info["pieces"] = b"".join(self.pieces)
640 self.meta["piece layers"] = self.piece_layers
641 return info
643 def _traverse(self, path: str) -> dict:
644 """
645 Build meta dictionary while walking directory.
647 Parameters
648 ----------
649 path : str
650 Path to target file.
651 """
652 if os.path.isfile(path):
653 file_size = os.path.getsize(path)
655 self.files.append({
656 "length":
657 file_size,
658 "path":
659 os.path.relpath(path, self.path).split(os.sep),
660 })
662 if file_size == 0:
663 return {"": {"length": file_size}}
665 logger.debug("Hashing %s", str(path))
666 file_hash = HasherHybrid(path, self.piece_length, **self.kws)
668 if file_size > self.piece_length:
669 self.piece_layers[file_hash.root] = file_hash.piece_layer
671 self.hashes.append(file_hash)
672 self.pieces.extend(file_hash.pieces)
674 if file_hash.padding_file:
675 self.files.append(file_hash.padding_file)
677 return {"": {"length": file_size, "pieces root": file_hash.root}}
679 tree = {}
680 if os.path.isdir(path):
681 for name in sorted(os.listdir(path)):
682 tree[name] = self._traverse(os.path.join(path, name))
683 return tree
686class TorrentAssembler(MetaFile, ProgMixin):
687 """
688 Assembler class for Bittorrent version 2 and hybrid meta files.
690 This differs from the TorrentFileV2 and TorrentFileHybrid, because
691 it can be used as an iterator and works for both versions.
693 Parameters
694 ----------
695 **kwargs : dict
696 Keyword arguments for torrent options.
697 """
699 hasher = FileHasher
701 def __init__(self, **kwargs):
702 """
703 Create Bittorrent v1 v2 hybrid metafiles.
704 """
705 super().__init__(**kwargs)
706 logger.debug("Assembling bittorrent Hybrid file")
707 self.name = os.path.basename(self.path)
708 self.hashes = []
709 self.piece_layers = {}
710 self.pieces = bytearray()
711 self.files = []
712 self.hybrid = self.meta_version == "3"
713 size, file_list = utils.filelist_total(self.path)
714 self.kws = {
715 "progress": self.progress,
716 "progress_bar": None,
717 "hybrid": self.hybrid,
718 }
719 self.total = len(file_list)
721 if self.progress == 2:
722 self.prog_bar = self.get_progress_tracker(size, str(self.path))
723 self.kws["progress_bar"] = self.prog_bar
725 elif self.progress == 0:
726 self.prog_bar = self.get_progress_tracker(-1, "")
727 self.kws["progress_bar"] = self.prog_bar
729 self.assemble()
731 def assemble(self):
732 """
733 Assemble the parts of the torrentfile into meta dictionary.
734 """
735 info = self.meta["info"]
736 info["meta version"] = 2
738 if os.path.isfile(self.path):
739 info["file tree"] = {self.name: self._traverse(self.path)}
740 info["length"] = os.path.getsize(self.path)
742 else:
743 info["file tree"] = self._traverse(self.path)
744 if self.hybrid:
745 info["files"] = self.files
747 if self.hybrid:
748 info["pieces"] = self.pieces
749 self.meta["piece layers"] = self.piece_layers
750 return info
752 def _traverse(self, path: str) -> dict:
753 """
754 Build meta dictionary while walking directory.
756 Parameters
757 ----------
758 path : str
759 Path to target file.
760 """
761 if os.path.isfile(path):
762 file_size = os.path.getsize(path)
763 if self.hybrid:
764 self.files.append({
765 "length":
766 file_size,
767 "path":
768 os.path.relpath(path, self.path).split(os.sep),
769 })
771 if file_size == 0:
772 return {"": {"length": file_size}}
774 logger.debug("Hashing %s", str(path))
775 hasher = FileHasher(path, self.piece_length, **self.kws)
776 layers = bytearray()
777 for result in hasher:
778 if self.hybrid:
779 layer_hash, piece = result
780 self.pieces.extend(piece)
781 else:
782 layer_hash = result
783 layers.extend(layer_hash)
784 if file_size > self.piece_length:
785 self.piece_layers[hasher.root] = layers
786 if self.hybrid and hasher.padding_file:
787 self.files.append(hasher.padding_file)
789 return {"": {"length": file_size, "pieces root": hasher.root}}
791 tree = {}
792 if os.path.isdir(path):
793 for name in sorted(os.listdir(path)):
794 tree[name] = self._traverse(os.path.join(path, name))
795 return tree