design pt1

This commit is contained in:
Thomas Hobson 2021-02-18 23:09:16 +13:00
parent 30426d0d96
commit 3103721c8e
No known key found for this signature in database
GPG Key ID: 9F1FD9D87950DB6F
4 changed files with 163 additions and 1 deletions

View File

@ -1,4 +1,4 @@
== Breif ==
== Breif == [ Piston ]
This document covers the overall architecture of Piston v3, and not the
individual components and their implementations.

75
design/api.txt Normal file
View File

@ -0,0 +1,75 @@
== Piston API == [ Piston ]
When we speak of piston, what we actually talk about is the Piston API.
This API provides unrestricted, unlimited access to managing piston and
thus shouldn't be publicly exposed. This API is comparable to one of the
docker engine, where everything regarding control of docker goes directly
through the api.
The API is responsible for managing the execution lifecycle of any given
job, as well as managing the different languages which it can execute a
job in.
== Job Execution ==
Piston v3 exposes an endpoint per package `/execute`, which when called takes
in both a string of code, and an array of arguments to pass into the program
as well as data to write to STDIN. The stdout and stderr from the process are
then both returned seperately, along with the error code returned.
All of this is has no rate-limiting built in making it lightning fast as a call
will directly start the runner process and get under way instantly.
The 2 stages of this process - compile and run are both run in sequence, with
different timeouts configurable in the runners config file located in the
data directory.
Requests to this endpoint can have caching enabled at 3 different levels.
The first option is to have no caching, which is the default for all
interpreted language. The second option is for the compiled binaries to be
cached, which is the default for all compiled languages. The final option is
for output to be cached, which isn't used by default but can be enabled per
package or per request. This is done for the reason that code may choose to
source data from /dev/(u)random or similar sources and as such may not be as
reliable when their outputs are cached. Caching is per package and is used as
an acceleration method to help boost performance of Piston. Cache entries are
automatically purged after the set time, or can be manually purged through the
API on a per package basis.
== Package Manager ==
Piston v3 has an inbuilt package manager which is responsible for
(un)installing different packages. Piston v3 by default has access to a single
offical repository hosting various versions of various common languages. These
packages and repositories conform to the specifications set out in ppman.txt
The Piston API service downloads the repository index whenever a `/packages`
request is issued to a repository with the `sync` flag is set. This will cause
the service to download the latest repostiory index off the mirror.
In piston there is no concept of a package being "outdated" as each package is
a specific version of a language, and different languages can be installed in
paralleland function without any issues. Each package should be considered the
final version of that language. If there is a new version of a language
available (i.e. Python 3.9.1 -> 3.9.2), a new package should be created for
this.
Invidual languages can be queried from the repo using the
`/repos/{repo}/packages/{package}/{package-version}` endpoint. This endpoint
allows for the metadata of the package to be accessed, such as the author,
size, checksums, dependencies, build file git url and download url.
To install packages, a request to `/install` can be made to the package
endpoint and it will download and install it, making it available on the
`/packages/{package}/{version}` endpoint.
There is a meta-repository name `all` which can be used to access all
repositories.
Internally the install process involved downloading and unpacking the package,
ensuring any dependencies are also downloaded and installed, mounting the
squashfs filesystem to a folder, then overlaying it with all its dependencies
in another folder.

13
design/index.txt Normal file
View File

@ -0,0 +1,13 @@
== Index == [ Piston ]
Design outlines the design of the different components and does not give a
concrete definition of the implementation or how to use it.
api.txt Design of Piston API
ppman.txt Design of the package manager's package and repository format
== Glossary ==
Execution Job A single code run with arguments resulting in an output
Package A version of a language bundled together into a tarball

74
design/ppman.txt Normal file
View File

@ -0,0 +1,74 @@
== Package Manager (ppman) == [ Piston ]
The package manager is the part of the API responsible for managing different
versions of different languages, managing their installation, uninstallation
and their dependencies. The package manager talks over the piston api and is
built directly into piston, although has parts which are not directly built
into the API (i.e. the repositories and the cli utility).
The package manager is a complex part of piston, and requires 2 different file
specifications - the repository index file and the package file.
== Repository Index File ==
The piston repository is the central place where packages are hosted and
downloaded from. This repository can either be a webserver or a local file
containing the right content - as long as its accessable by a URL, its
considered a valid repository by piston. A repository URL is simply a URL
pointing to a repository index file, as set out by the following information.
A repository index file is a YAML file containing the keys: `schema`, `baseurl`
and `packages`.
The schema key simply should have a value of `ppman-repo-1`. This indicates the
version and file format for the client to recieve.
The baseurl key contains the base url that relative URLs should be based off,
this doesn't need to be related to the url that the repository index is hosted
at, only the downloadable files, which are possible to split over many domains
by using absolute paths.
The packages key contains a list of packages, which contain the keys: `author`,
`language`, `version`, `checksums`, `dependencies`, `size`, `buildfile` and
`download`.
The author field is self explainatory, it is simply the authors name and email,
formatted similar to git's default format: `Full Name <email@address>`. If the
repository index is automatically generated, it is best to use the commit
author's name in here.
The language and version fields define the version and name of the compiler or
interpreter contained within. The language should not include a version at all.
In the case of python, use the name python for both python 2 and 3, using the
version field to differentiate between the 2.
The checksums field is simply a map of hash types to hashes, hash types include
md5, sha1, sha256, sha512. The digests should simply be written as lowercase
hex characters. Only one checksum is required, but if more are supplied the
most secure one is picked, with sha512 as the highest possible.
The dependencies is simply a map of language names to versions, which should be
installed for the package to run correctly. An example of this would be
typescript requires node to run.
The size field is the number of bytes the package file is in size, while
uncompressed. This is used to determine if there is enough room, and thus
should be accurate.
The buildfile field is a URL pointing to the exact build script for this build.
This should always point to a URL either containing steps, a script or other
means of reproducing the build. This field is purely so people can understand
how the image was built, and to make sure you aren't packing any mallicious
code into it.
The final field is download, this points to a URL of which the package file can
be obtained from. If this is a relative url, the baseurl will be appended to
it. This is particularly useful if everything is stored within 1 s3 bucket, or
you have a repository in a folder.
== Package File ==
TODO