Skip to content

tagolym-ml¤

Tag high school math olympiad problems with 10 predefined topics:

Big Topics Algebra Subtopics Geometry Subtopics Number Theory Subtopics
algebra inequality circle modular arithmetic
geometry function trigonometry
number theory polynomial
combinatorics

Input text:

Find all functions \(f:(0,\infty)\rightarrow (0,\infty)\) such that for any \(x,y\in (0,\infty)\),

\[xf(x^2)f(f(y)) + f(yf(x)) = f(xy) \left(f(f(x^2)) + f(f(y^2))\right).\]

Predicted tags:

["algebra", "function"]

Virtual Environment¤

$ git clone https://github.com/dwiuzila/tagolym-ml.git
$ cd tagolym-ml
$ git checkout code_migration
$ python3 -m venv venv
$ source venv/bin/activate
$ python3 -m pip install --upgrade pip
$ python3 -m pip install -e .

Directory¤

config/
├── args_opt.json         - optimized parameters
├── args.json             - preprocessing/training parameters
├── config.py             - configuration setup
├── run_id.txt            - run id of the last model training
├── test_metrics.json     - model performance on test split
├── train_metrics.json    - model performance on train split
└── val_metrics.json      - model performance on validation split

docs/
├── tagolym/
│   ├── data.md           - documentation for data.py
│   ├── evaluate.md       - documentation for evaluate.py
│   ├── main.md           - documentation for main.py
│   ├── predict.md        - documentation for predict.py
│   ├── train.md          - documentation for train.py
│   └── utils.md          - documentation for utils.py
├── index.md              - homepage
├── license.md            - project license
└── logo.png              - project logo

tagolym/
├── _typing.py            - type hints
├── data.py               - data processing components
├── evaluate.py           - evaluation components
├── main.py               - training/optimization pipelines
├── predict.py            - inference components
├── train.py              - training components
└── utils.py              - supplementary utilities

.gitignore                - files/folders that git will ignore

LICENSE                   - project license

README.md                 - longform description of the project

mkdocs.yml                - configuration file for docs

pyproject.toml            - build system dependencies

requirements.txt          - package dependencies

setup.py                  - code packaging

Workflow¤

You wouldn't be able to execute the # query data part in the code snippet below due to data access restrictions. For that, you'd need my credential, which unfortunately is not to be shared. But worry not, I'll provide samples for you to work with. What you need to do is simply download the samples labeled_data.json and save the file in a folder named data in the working directory.

from pathlib import Path
from config import config
from tagolym import main

# query data
key_path = "credentials/bigquery-key.json"
main.elt_data(key_path)

# optimize model
args_fp = Path(config.CONFIG_DIR, "args.json")
main.optimize(args_fp, study_name="optimization", num_trials=10)

# train model
args_fp = Path(config.CONFIG_DIR, "args_opt.json")
main.train_model(args_fp, experiment_name="baselines", run_name="sgd")

# inference
texts = [
    "Let $c,d \geq 2$ be naturals. Let $\{a_n\}$ be the sequence satisfying $a_1 = c, a_{n+1} = a_n^d + c$ for $n = 1,2,\cdots$.Prove that for any $n \geq 2$, there exists a prime number $p$ such that $p|a_n$ and $p \not | a_i$ for $i = 1,2,\cdots n-1$.",
    "Let $ABC$ be a triangle with circumcircle $\Gamma$ and incenter $I$ and let $M$ be the midpoint of $\overline{BC}$. The points $D$, $E$, $F$ are selected on sides $\overline{BC}$, $\overline{CA}$, $\overline{AB}$ such that $\overline{ID} \perp \overline{BC}$, $\overline{IE}\perp \overline{AI}$, and $\overline{IF}\perp \overline{AI}$. Suppose that the circumcircle of $\triangle AEF$ intersects $\Gamma$ at a point $X$ other than $A$. Prove that lines $XD$ and $AM$ meet on $\Gamma$.",
    "Find all functions $f:(0,\infty)\rightarrow (0,\infty)$ such that for any $x,y\in (0,\infty)$, $$xf(x^2)f(f(y)) + f(yf(x)) = f(xy) \left(f(f(x^2)) + f(f(y^2))\right).$$",
    "Let $n$ be an even positive integer. We say that two different cells of a $n \times n$ board are [b]neighboring[/b] if they have a common side. Find the minimal number of cells on the $n \times n$ board that must be marked so that any cell (marked or not marked) has a marked neighboring cell."
]
main.predict_tag(texts=texts)

Documentation¤

See full documentation here.

$ git checkout documentation
$ pip install -e ".[docs]"
$ mkdocs gh-deploy --force