Debian NEW package overview for tokenizers
tokenizers_0.20.3+dfsg-1_amd64.changes (click to toggle)
Format:1.8
Date:Fri, 09 May 2025 14:01:31 +0000
Source:tokenizers
Binary:python3-tokenizers python3-tokenizers-dbgsym
Architecture:source amd64
Version:0.20.3+dfsg-1
Distribution:experimental
Urgency:medium
Maintainer:Debian Deep Learning Team <debian-ai@lists.debian.org>
Changed-By:Kohei Sendai <kouhei.sendai@gmail.com>
Description:
python3-tokenizers - Fast State-of-the-Art Tokenizers for Research and Production
Closes:1109641
Changes:
tokenizers (0.20.3+dfsg-1) experimental; urgency=medium

  * Initial release. (Closes: #1109641)
Files:
f04d31b9a65d1e8fa6ec899b35184074 2986 python optional tokenizers_0.20.3+dfsg-1.dsc
469ab9768f74bc6614add272dcce2dab 1399061 python optional tokenizers_0.20.3+dfsg.orig.tar.gz
f0ed573bdea549183ca5acabe2ca234c 4400 python optional tokenizers_0.20.3+dfsg-1.debian.tar.xz
56584ec25a84d995debcd58ca03a7aac 12156832 debug optional python3-tokenizers-dbgsym_0.20.3+dfsg-1_amd64.deb
144951e1fd8eb7ddadeab88ee8013483 1647224 python optional python3-tokenizers_0.20.3+dfsg-1_amd64.deb
2676f63c2cc2468760193659a16a3fbf 26340 python optional tokenizers_0.20.3+dfsg-1_amd64.buildinfo
tokenizers_0.20.3+dfsg-1.dsc (click to toggle)
Format:3.0 (quilt)
Source:tokenizers
Binary:python3-tokenizers
Architecture:any
Version:0.20.3+dfsg-1
Maintainer:Debian Deep Learning Team <debian-ai@lists.debian.org>
Uploaders:Kohei Sendai <kouhei.sendai@gmail.com>
Homepage:https://github.com/huggingface/tokenizers
Standards-Version:4.7.2
Vcs-Browser:https://salsa.debian.org/deeplearning-team/tokenizers
Vcs-Git:https://salsa.debian.org/deeplearning-team/tokenizers.git
Testsuite:autopkgtest
Testsuite-Triggers:python3-numpy, python3-pytest, python3-pytest-env, python3-pytest-mock
Build-Depends:debhelper-compat (=13), dh-sequence-python3, cargo, librust-derive-builder-dev, librust-env-logger-dev, librust-esaxx-rs-dev, librust-indicatif-dev, librust-itertools-dev, librust-libc-dev, librust-macro-rules-attribute-dev, librust-monostate-dev, librust-ndarray-dev, librust-numpy-dev, librust-onig-dev, librust-pyo3-dev, librust-rand-dev, librust-rayon-cond-dev, librust-rayon-dev, librust-regex-dev, librust-regex-syntax-dev, librust-serde-derive-dev, librust-serde-dev, librust-serde-json-dev, librust-spm-precompiled-dev, librust-thiserror-1-dev, librust-unicode-categories-dev, librust-unicode-normalization-alignments-dev, pybuild-plugin-pyproject, python3-all, python3-filelock, python3-fsspec, python3-huggingface-hub, python3-maturin, python3-packaging, python3-requests, python3-setuptools, python3-tqdm, python3-typing-extensions, python3-yaml, rustc
Package-List:python3-tokenizers deb python optional arch=any
Files:
469ab9768f74bc6614add272dcce2dab 1399061 tokenizers_0.20.3+dfsg.orig.tar.gz
f0ed573bdea549183ca5acabe2ca234c 4400 tokenizers_0.20.3+dfsg-1.debian.tar.xz
lintian 2.116.3 check for tokenizers_0.20.3+dfsg-1.dsc (click to toggle)
README.source for tokenizers_0.20.3+dfsg-1.dsc (click to toggle)
No README.source in this package
control file for python3-tokenizers_0.20.3+dfsg-1_amd64.deb (click to toggle)
Package:python3-tokenizers
Source:tokenizers
Version:0.20.3+dfsg-1
Architecture:amd64
Maintainer:Debian Deep Learning Team <debian-ai@lists.debian.org>
Installed-Size:5929
Depends:python3 (<<3.14), python3 (>=3.13~), python3-huggingface-hub, python3:any, libc6 (>=2.34), libgcc-s1 (>=4.2), libstdc++6 (>=4.3)
Section:python
Priority:optional
Homepage:https://github.com/huggingface/tokenizers
Description:
Fast State-of-the-Art Tokenizers for Research and Production
 Provides an implementation of today's most used tokenizers, with a focus on
 performance and versatility.
 .
 Main features:
  * Train new vocabularies and tokenize, using today's most used tokenizers.
  * Extremely fast (both training and tokenization), thanks to the Rust
    implementation. Takes less than 20 seconds to tokenize a GB of text on
    a server's CPU.
  * Easy to use, but also extremely versatile.
  * Designed for research and production.
  * Normalization comes with alignments tracking. It's always possible to get
    the part of the original sentence that corresponds to a given token.
  * Does all the pre-processing: Truncate, Pad, add the special tokens your
    model needs.
lintian 2.116.3 check for python3-tokenizers_0.20.3+dfsg-1_amd64.deb (click to toggle)
contents of python3-tokenizers_0.20.3+dfsg-1_amd64.deb (click to toggle)
drwxr-xr-x root/root         0 2025-05-09 14:01 ./
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/lib/
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/lib/python3/
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/
-rw-r--r-- root/root      2615 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/__init__.py
-rw-r--r-- root/root     40182 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/__init__.pyi
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/decoders/
-rw-r--r-- root/root       335 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/decoders/__init__.py
-rw-r--r-- root/root      7244 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/decoders/__init__.pyi
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/implementations/
-rw-r--r-- root/root       310 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/implementations/__init__.py
-rw-r--r-- root/root     14206 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/implementations/base_tokenizer.py
-rw-r--r-- root/root      5520 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/implementations/bert_wordpiece.py
-rw-r--r-- root/root      4289 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/implementations/byte_level_bpe.py
-rw-r--r-- root/root      5466 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/implementations/char_level_bpe.py
-rw-r--r-- root/root      3738 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/implementations/sentencepiece_bpe.py
-rw-r--r-- root/root      7580 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/implementations/sentencepiece_unigram.py
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/models/
-rw-r--r-- root/root       176 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/models/__init__.py
-rw-r--r-- root/root     16929 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/models/__init__.pyi
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/normalizers/
-rw-r--r-- root/root       841 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/normalizers/__init__.py
-rw-r--r-- root/root     20897 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/normalizers/__init__.pyi
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/pre_tokenizers/
-rw-r--r-- root/root       557 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/pre_tokenizers/__init__.py
-rw-r--r-- root/root     23602 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/pre_tokenizers/__init__.pyi
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/processors/
-rw-r--r-- root/root       307 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/processors/__init__.py
-rw-r--r-- root/root     11357 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/processors/__init__.pyi
-rw-r--r-- root/root   5828864 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/tokenizers.cpython-313-x86_64-linux-gnu.so
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/tools/
-rw-r--r-- root/root        55 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/tools/__init__.py
-rw-r--r-- root/root      4850 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/tools/visualizer-styles.css
-rw-r--r-- root/root     14625 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/tools/visualizer.py
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/trainers/
-rw-r--r-- root/root       248 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/trainers/__init__.py
-rw-r--r-- root/root      5382 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/trainers/__init__.pyi
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers-0.20.3.dist-info/
-rw-r--r-- root/root         7 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers-0.20.3.dist-info/INSTALLER
-rw-r--r-- root/root      6716 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers-0.20.3.dist-info/METADATA
-rw-r--r-- root/root        99 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers-0.20.3.dist-info/WHEEL
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/share/
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/share/doc/
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/share/doc/python3-tokenizers/
-rw-r--r-- root/root       170 2025-05-09 14:01 ./usr/share/doc/python3-tokenizers/changelog.Debian.gz
-rw-r--r-- root/root      2741 2025-05-09 14:01 ./usr/share/doc/python3-tokenizers/copyright
control file for python3-tokenizers-dbgsym_0.20.3+dfsg-1_amd64.deb (click to toggle)
Package:python3-tokenizers-dbgsym
Source:tokenizers
Version:0.20.3+dfsg-1
Auto-Built-Package:debug-symbols
Architecture:amd64
Maintainer:Debian Deep Learning Team <debian-ai@lists.debian.org>
Installed-Size:12723
Depends:python3-tokenizers (=0.20.3+dfsg-1)
Section:debug
Priority:optional
Description:
debug symbols for python3-tokenizers
Build-Ids:8a2e139722e1b9212cb184f8fa8aaa53a42783e0
lintian 2.116.3 check for python3-tokenizers-dbgsym_0.20.3+dfsg-1_amd64.deb (click to toggle)
contents of python3-tokenizers-dbgsym_0.20.3+dfsg-1_amd64.deb (click to toggle)
drwxr-xr-x root/root         0 2025-05-09 14:01 ./
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/lib/
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/lib/debug/
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/lib/debug/.build-id/
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/lib/debug/.build-id/8a/
-rw-r--r-- root/root  13017424 2025-05-09 14:01 ./usr/lib/debug/.build-id/8a/2e139722e1b9212cb184f8fa8aaa53a42783e0.debug
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/share/
drwxr-xr-x root/root         0 2025-05-09 14:01 ./usr/share/doc/
lrwxrwxrwx root/root         0 2025-05-09 14:01 ./usr/share/doc/python3-tokenizers-dbgsym -> python3-tokenizers

Timestamp: 22.07.2025 / 15:02:33 (UTC)