tokenizers_0.20.3+dfsg-1_amd64.changes (click to toggle) | |
---|---|
Format: | 1.8 |
Date: | Fri, 09 May 2025 14:01:31 +0000 |
Source: | tokenizers |
Binary: | python3-tokenizers python3-tokenizers-dbgsym |
Architecture: | source amd64 |
Version: | 0.20.3+dfsg-1 |
Distribution: | experimental |
Urgency: | medium |
Maintainer: | Debian Deep Learning Team <debian-ai@lists.debian.org> |
Changed-By: | Kohei Sendai <kouhei.sendai@gmail.com> |
Description: | python3-tokenizers - Fast State-of-the-Art Tokenizers for Research and Production |
Closes: | 1109641 |
Changes: | tokenizers (0.20.3+dfsg-1) experimental; urgency=medium * Initial release. (Closes: #1109641) |
Files: | f04d31b9a65d1e8fa6ec899b35184074 2986 python optional tokenizers_0.20.3+dfsg-1.dsc 469ab9768f74bc6614add272dcce2dab 1399061 python optional tokenizers_0.20.3+dfsg.orig.tar.gz f0ed573bdea549183ca5acabe2ca234c 4400 python optional tokenizers_0.20.3+dfsg-1.debian.tar.xz 56584ec25a84d995debcd58ca03a7aac 12156832 debug optional python3-tokenizers-dbgsym_0.20.3+dfsg-1_amd64.deb 144951e1fd8eb7ddadeab88ee8013483 1647224 python optional python3-tokenizers_0.20.3+dfsg-1_amd64.deb 2676f63c2cc2468760193659a16a3fbf 26340 python optional tokenizers_0.20.3+dfsg-1_amd64.buildinfo |
tokenizers_0.20.3+dfsg-1.dsc (click to toggle) | |
---|---|
Format: | 3.0 (quilt) |
Source: | tokenizers |
Binary: | python3-tokenizers |
Architecture: | any |
Version: | 0.20.3+dfsg-1 |
Maintainer: | Debian Deep Learning Team <debian-ai@lists.debian.org> |
Uploaders: | Kohei Sendai <kouhei.sendai@gmail.com> |
Homepage: | https://github.com/huggingface/tokenizers |
Standards-Version: | 4.7.2 |
Vcs-Browser: | https://salsa.debian.org/deeplearning-team/tokenizers |
Vcs-Git: | https://salsa.debian.org/deeplearning-team/tokenizers.git |
Testsuite: | autopkgtest |
Testsuite-Triggers: | python3-numpy, python3-pytest, python3-pytest-env, python3-pytest-mock |
Build-Depends: | debhelper-compat (=13), dh-sequence-python3, cargo, librust-derive-builder-dev, librust-env-logger-dev, librust-esaxx-rs-dev, librust-indicatif-dev, librust-itertools-dev, librust-libc-dev, librust-macro-rules-attribute-dev, librust-monostate-dev, librust-ndarray-dev, librust-numpy-dev, librust-onig-dev, librust-pyo3-dev, librust-rand-dev, librust-rayon-cond-dev, librust-rayon-dev, librust-regex-dev, librust-regex-syntax-dev, librust-serde-derive-dev, librust-serde-dev, librust-serde-json-dev, librust-spm-precompiled-dev, librust-thiserror-1-dev, librust-unicode-categories-dev, librust-unicode-normalization-alignments-dev, pybuild-plugin-pyproject, python3-all, python3-filelock, python3-fsspec, python3-huggingface-hub, python3-maturin, python3-packaging, python3-requests, python3-setuptools, python3-tqdm, python3-typing-extensions, python3-yaml, rustc |
Package-List: | python3-tokenizers deb python optional arch=any |
Files: | 469ab9768f74bc6614add272dcce2dab 1399061 tokenizers_0.20.3+dfsg.orig.tar.gz f0ed573bdea549183ca5acabe2ca234c 4400 tokenizers_0.20.3+dfsg-1.debian.tar.xz |
control file for python3-tokenizers_0.20.3+dfsg-1_amd64.deb (click to toggle) | |
---|---|
Package: | python3-tokenizers |
Source: | tokenizers |
Version: | 0.20.3+dfsg-1 |
Architecture: | amd64 |
Maintainer: | Debian Deep Learning Team <debian-ai@lists.debian.org> |
Installed-Size: | 5929 |
Depends: | python3 (<<3.14), python3 (>=3.13~), python3-huggingface-hub, python3:any, libc6 (>=2.34), libgcc-s1 (>=4.2), libstdc++6 (>=4.3) |
Section: | python |
Priority: | optional |
Homepage: | https://github.com/huggingface/tokenizers |
Description: | Fast State-of-the-Art Tokenizers for Research and Production Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. . Main features: * Train new vocabularies and tokenize, using today's most used tokenizers. * Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU. * Easy to use, but also extremely versatile. * Designed for research and production. * Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token. * Does all the pre-processing: Truncate, Pad, add the special tokens your model needs. |
contents of python3-tokenizers_0.20.3+dfsg-1_amd64.deb (click to toggle) | |
---|---|
drwxr-xr-x root/root 0 2025-05-09 14:01 ./ drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/ drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/lib/ drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/lib/python3/ drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/ drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/ -rw-r--r-- root/root 2615 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/__init__.py -rw-r--r-- root/root 40182 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/__init__.pyi drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/decoders/ -rw-r--r-- root/root 335 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/decoders/__init__.py -rw-r--r-- root/root 7244 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/decoders/__init__.pyi drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/implementations/ -rw-r--r-- root/root 310 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/implementations/__init__.py -rw-r--r-- root/root 14206 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/implementations/base_tokenizer.py -rw-r--r-- root/root 5520 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/implementations/bert_wordpiece.py -rw-r--r-- root/root 4289 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/implementations/byte_level_bpe.py -rw-r--r-- root/root 5466 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/implementations/char_level_bpe.py -rw-r--r-- root/root 3738 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/implementations/sentencepiece_bpe.py -rw-r--r-- root/root 7580 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/implementations/sentencepiece_unigram.py drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/models/ -rw-r--r-- root/root 176 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/models/__init__.py -rw-r--r-- root/root 16929 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/models/__init__.pyi drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/normalizers/ -rw-r--r-- root/root 841 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/normalizers/__init__.py -rw-r--r-- root/root 20897 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/normalizers/__init__.pyi drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/pre_tokenizers/ -rw-r--r-- root/root 557 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/pre_tokenizers/__init__.py -rw-r--r-- root/root 23602 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/pre_tokenizers/__init__.pyi drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/processors/ -rw-r--r-- root/root 307 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/processors/__init__.py -rw-r--r-- root/root 11357 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/processors/__init__.pyi -rw-r--r-- root/root 5828864 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/tokenizers.cpython-313-x86_64-linux-gnu.so drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/tools/ -rw-r--r-- root/root 55 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/tools/__init__.py -rw-r--r-- root/root 4850 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/tools/visualizer-styles.css -rw-r--r-- root/root 14625 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/tools/visualizer.py drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/trainers/ -rw-r--r-- root/root 248 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/trainers/__init__.py -rw-r--r-- root/root 5382 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers/trainers/__init__.pyi drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers-0.20.3.dist-info/ -rw-r--r-- root/root 7 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers-0.20.3.dist-info/INSTALLER -rw-r--r-- root/root 6716 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers-0.20.3.dist-info/METADATA -rw-r--r-- root/root 99 2025-05-09 14:01 ./usr/lib/python3/dist-packages/tokenizers-0.20.3.dist-info/WHEEL drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/share/ drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/share/doc/ drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/share/doc/python3-tokenizers/ -rw-r--r-- root/root 170 2025-05-09 14:01 ./usr/share/doc/python3-tokenizers/changelog.Debian.gz -rw-r--r-- root/root 2741 2025-05-09 14:01 ./usr/share/doc/python3-tokenizers/copyright |
copyright of python3-tokenizers_0.20.3+dfsg-1_amd64.deb (click to toggle) | |
---|---|
Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ Source: https://github.com/huggingface/tokenizers Comment: Files excluded include: documentation files which inculde font file with unclear license. Files-Excluded: docs/* Files: * Copyright: 2019-2025 The HuggingFace Team 2019-2025 Anthony MOI <m.anthony.moi@gmail.com> 2019-2025 Nicolas Patry <nicolas@huggingface.co> License: Apache-2.0 Files: bindings/node/* Copyright: 2019-2025 Anthony MOI <m.anthony.moi@gmail.com> License: Apache-2.0 Comment: This directory is based on napi-rs template(https://github.com/napi-rs/package-template). This direcoty includes LICENSE file which indicate MIT however, this file is exact same as template and README and package.json says Apache-2.0 Files: tokenizers/examples/unstable_wasm/www/* Copyright: Ashley Williams <ashley666ashley@gmail.com> License: Expat or Apache-2.0 Files: debian/* Copyright: 2025 Kohei Sendai <kouhei.sendai@gmail.com> License: Apache-2.0 License: Apache-2.0 Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at . http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. . On Debian systems, the complete text of the Apache version 2.0 license can be found in "/usr/share/common-licenses/Apache-2.0". License: Expat Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: . The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. . THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
control file for python3-tokenizers-dbgsym_0.20.3+dfsg-1_amd64.deb (click to toggle) | |
---|---|
Package: | python3-tokenizers-dbgsym |
Source: | tokenizers |
Version: | 0.20.3+dfsg-1 |
Auto-Built-Package: | debug-symbols |
Architecture: | amd64 |
Maintainer: | Debian Deep Learning Team <debian-ai@lists.debian.org> |
Installed-Size: | 12723 |
Depends: | python3-tokenizers (=0.20.3+dfsg-1) |
Section: | debug |
Priority: | optional |
Description: | debug symbols for python3-tokenizers |
Build-Ids: | 8a2e139722e1b9212cb184f8fa8aaa53a42783e0 |
contents of python3-tokenizers-dbgsym_0.20.3+dfsg-1_amd64.deb (click to toggle) | |
---|---|
drwxr-xr-x root/root 0 2025-05-09 14:01 ./ drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/ drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/lib/ drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/lib/debug/ drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/lib/debug/.build-id/ drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/lib/debug/.build-id/8a/ -rw-r--r-- root/root 13017424 2025-05-09 14:01 ./usr/lib/debug/.build-id/8a/2e139722e1b9212cb184f8fa8aaa53a42783e0.debug drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/share/ drwxr-xr-x root/root 0 2025-05-09 14:01 ./usr/share/doc/ lrwxrwxrwx root/root 0 2025-05-09 14:01 ./usr/share/doc/python3-tokenizers-dbgsym -> python3-tokenizers |
copyright of python3-tokenizers-dbgsym_0.20.3+dfsg-1_amd64.deb (click to toggle) | |
---|---|
WARNING: No copyright found, please check package manually. |
Timestamp: 22.07.2025 / 15:02:33 (UTC)