@startuml !include jb-plantuml-theme.puml skinparam linetype ortho top to bottom direction header A [[https://en.wikipedia.org/wiki/ZIP_(file_format) ZIP]] file format with optimized metadata. endheader component "File Entry 1" as FE1 component "File Entry N" as FE2 note right of FE1 The relative offset of the local file header does not point directly to the data, but rather to the header itself. This means that you need to perform two seeks in order to locate the actual data, as the size of the local file header can vary. As an optimization, you can attempt to precompute the data offset when reading the central directory file header. This optimization is implemented in the HashMapZipFile class. However, ImmutableZipFile uses a special index for this purpose, as explained below. end note FE1 -- FE2 component "File entry ~__index__" as INDEX { component "A list of keys along with their corresponding offsets and sizes." as INDEX_M note right of INDEX_M A list of pairs consisting of long values. Each pair includes a key, represented as a 64-bit XXH3 hash of an entry name, and an offset and size represented as two ints packed into a single long value. This list enables the retrieval of data locations for all entries in a single bulk read operation. It contains no file names or other unnecessary metadata. end note component "class package hashes" as INDEX_PC note right of INDEX_PC A list of long values representing the 64-bit XXH3 hash of a package name. This list is not used by the ZipFile implementation but is consumed by the class loader. It allows for a quick determination of whether a class name is located within a ZIP file or not. While it does not provide much benefit for a single ZIP file, as name lookup can be done with a single map lookup, it enables the clustering of multiple ZIP files. This clustering helps avoid a linear search across all ZIP files in a classpath. end note component "resource package hashes" as INDEX_PR note right of INDEX_PR The same concept applies to resource package hashes. However, there are two different sets of hashes since there is no correlation between class packages and resource packages. end note component names { component "name lengths" as INDEX_NL note right of INDEX_NL A list of name lengths represented as shorts. This list enables the reading of integers in a single bulk read operation, directly from native memory. end note component "names" as INDEX_NS note right of INDEX_NS List of strings. end note INDEX_NL -down- INDEX_NS } note bottom of names Entry names. They are not loaded into memory when the ZipFile is opened; instead, they are loaded only when requested. This is useful, for instance, when you want to process entries based on their names, such as finding entries by a specific prefix. end note INDEX_M -- INDEX_PC INDEX_PC -- INDEX_PR INDEX_PR -- names } note top of INDEX The Zip specification is not violated. The index data represents a regular file entry. end note FE2 -- INDEX component "Central directory" as CD note right of CD The index format version is stored in the 'File comment' field. Only the latest format is supported. If a ZIP file does not have a comment or the index version is not equal to the latest, a fallback implementation is used that is capable of reading any ZIP file. end note INDEX -- CD @enduml