How to Train an ML Model with One SnapQL Command

Why SQL for machine learning?

Most ML tutorials start with import pandas and end with 200 lines of Python. That's fine if you're a data scientist, but what about the analyst who understands the business problem better than anyone and already thinks in SQL?

DataLAB lets you train production-grade ML models using SnapQL commands you can learn quickly.

Step 1: Load your data

Start by loading or connecting your dataset. DataLAB supports CSV, Excel, Parquet, JSON, and direct SQL sources.

In practice, this is often a business analyst or data lead pulling in a customer extract, support snapshot, or billing export before a retention or forecasting discussion.

SELECT * FROM customer_data LIMIT 5;

You can also connect directly to SQL Server, PostgreSQL, MySQL, Oracle, or SQLite.

Step 2: Explore

Understand your data before modelling:

This is the point where a team usually wants quick answers before it commits to a full modelling run: how many rows are here, how many unique customers do we have, and how imbalanced is the target?

SELECT
    COUNT(*) AS rows,
    COUNT(DISTINCT customer_id) AS unique_customers,
    AVG(monthly_charges) AS avg_charges,
    SUM(CASE WHEN churned = 1 THEN 1 ELSE 0 END) AS churned_count
FROM customer_data;

Step 3: Train the model

Here's where the core SnapQL workflow starts:

Think of a churn analyst or growth team lead who already knows the likely drivers and wants a first serious model without leaving the SQL environment they are already using for data prep.

CREATE MODEL churn_predictor
    USING RandomForest
    ON customer_data
    PREDICT churned
    FEATURES tenure, monthly_charges, contract_type,
             total_charges, num_support_tickets;

DataLAB automatically:

Encodes categorical features like contract_type
Splits data using the configured training and evaluation defaults
Trains the model and evaluates performance
Stores the model with full metadata for reuse

Step 4: Check model status

MODEL STATUS churn_predictor;

This gives you the saved model status and the tracked training output for the run. That is useful when a team is comparing whether the first model is good enough to share internally or whether it needs another pass.

If you want to try a stronger algorithm, create another model:

CREATE MODEL churn_v2
    USING XGBoost
    ON customer_data
    PREDICT churned
    FEATURES tenure, monthly_charges, contract_type,
             total_charges, num_support_tickets;

Step 5: Make predictions

Apply your model to new data:

This is where the work becomes operational. A team can take a fresh customer list, score it, and hand the resulting population to retention, sales, or customer-success teams for action.

PREDICT USING MODEL churn_predictor
    ON new_customers
    AS churn_predictions;

The result is a new dataset you can query, export, or feed into a larger workflow.

Step 6: Compare multiple runs

DataLAB's experiment tracking lets you compare models side by side:

This matters most when the work stops being a one-person exercise and becomes a team discussion about tradeoffs between speed, accuracy, explainability, and deployment fit.

CREATE EXPERIMENT churn_comparison;
USE EXPERIMENT churn_comparison;

CREATE MODEL rf_model USING RandomForest ON customer_data PREDICT churned;
CREATE MODEL xgb_model USING XGBoost ON customer_data PREDICT churned;
CREATE MODEL lr_model USING LogisticRegression ON customer_data PREDICT churned;

LIST RUNS FROM churn_comparison LIMIT 10;

That gives your team a consistent way to review how different algorithms behaved inside the same experiment context.

Beyond the basics

DataLAB supports a wide model surface, including:

Classification: Random Forest, XGBoost, SVM, Logistic Regression, KNN, Naive Bayes, and more
Regression: Linear, Ridge, Lasso, Random Forest, XGBoost, SVR, and related variants
Clustering: K-Means, DBSCAN, Gaussian Mixture
Time series and advanced workflows: via the broader SnapQL and pipeline surface

Plus AutoML for automated model selection:

That is useful when a team wants a fast tournament across candidate models without spending the first week manually tuning every option.

AUTOML churn_auto
    FROM customer_data
    PREDICT churned
    MAX_TIME 300;

AutoML tries multiple algorithms and parameter combinations, then returns the strongest candidate within your time budget.

Try it yourself

All of the syntax above is aligned to the current SnapQL language reference. Request early access and we can show you how DataLAB fits your analytics or predictive workflow.

Why SQL for machine learning?

DataLAB lets you train production-grade ML models using SnapQL commands you can learn quickly.

Step 1: Load your data

Start by loading or connecting your dataset. DataLAB supports CSV, Excel, Parquet, JSON, and direct SQL sources.

In practice, this is often a business analyst or data lead pulling in a customer extract, support snapshot, or billing export before a retention or forecasting discussion.

SELECT * FROM customer_data LIMIT 5;

You can also connect directly to SQL Server, PostgreSQL, MySQL, Oracle, or SQLite.

Step 2: Explore

Understand your data before modelling:

This is the point where a team usually wants quick answers before it commits to a full modelling run: how many rows are here, how many unique customers do we have, and how imbalanced is the target?

SELECT
    COUNT(*) AS rows,
    COUNT(DISTINCT customer_id) AS unique_customers,
    AVG(monthly_charges) AS avg_charges,
    SUM(CASE WHEN churned = 1 THEN 1 ELSE 0 END) AS churned_count
FROM customer_data;

Step 3: Train the model

Here's where the core SnapQL workflow starts:

Think of a churn analyst or growth team lead who already knows the likely drivers and wants a first serious model without leaving the SQL environment they are already using for data prep.

CREATE MODEL churn_predictor
    USING RandomForest
    ON customer_data
    PREDICT churned
    FEATURES tenure, monthly_charges, contract_type,
             total_charges, num_support_tickets;

DataLAB automatically:

Encodes categorical features like contract_type
Splits data using the configured training and evaluation defaults
Trains the model and evaluates performance
Stores the model with full metadata for reuse

Step 4: Check model status

MODEL STATUS churn_predictor;

If you want to try a stronger algorithm, create another model:

CREATE MODEL churn_v2
    USING XGBoost
    ON customer_data
    PREDICT churned
    FEATURES tenure, monthly_charges, contract_type,
             total_charges, num_support_tickets;

Step 5: Make predictions

Apply your model to new data:

This is where the work becomes operational. A team can take a fresh customer list, score it, and hand the resulting population to retention, sales, or customer-success teams for action.

PREDICT USING MODEL churn_predictor
    ON new_customers
    AS churn_predictions;

The result is a new dataset you can query, export, or feed into a larger workflow.

Step 6: Compare multiple runs

DataLAB's experiment tracking lets you compare models side by side:

This matters most when the work stops being a one-person exercise and becomes a team discussion about tradeoffs between speed, accuracy, explainability, and deployment fit.

CREATE EXPERIMENT churn_comparison;
USE EXPERIMENT churn_comparison;

CREATE MODEL rf_model USING RandomForest ON customer_data PREDICT churned;
CREATE MODEL xgb_model USING XGBoost ON customer_data PREDICT churned;
CREATE MODEL lr_model USING LogisticRegression ON customer_data PREDICT churned;

LIST RUNS FROM churn_comparison LIMIT 10;

That gives your team a consistent way to review how different algorithms behaved inside the same experiment context.

Beyond the basics

DataLAB supports a wide model surface, including:

Classification: Random Forest, XGBoost, SVM, Logistic Regression, KNN, Naive Bayes, and more
Regression: Linear, Ridge, Lasso, Random Forest, XGBoost, SVR, and related variants
Clustering: K-Means, DBSCAN, Gaussian Mixture
Time series and advanced workflows: via the broader SnapQL and pipeline surface

Plus AutoML for automated model selection:

That is useful when a team wants a fast tournament across candidate models without spending the first week manually tuning every option.

AUTOML churn_auto
    FROM customer_data
    PREDICT churned
    MAX_TIME 300;

AutoML tries multiple algorithms and parameter combinations, then returns the strongest candidate within your time budget.

Try it yourself

All of the syntax above is aligned to the current SnapQL language reference. Request early access and we can show you how DataLAB fits your analytics or predictive workflow.

How to Train an ML Model with One SnapQL Command

Why SQL for machine learning?

Step 1: Load your data

Step 2: Explore

Step 3: Train the model

Step 4: Check model status

Step 5: Make predictions

Step 6: Compare multiple runs

Beyond the basics

Try it yourself

Want to try this yourself?

How to Train an ML Model with One SnapQL Command

Why SQL for machine learning?

Step 1: Load your data

Step 2: Explore

Step 3: Train the model

Step 4: Check model status

Step 5: Make predictions

Step 6: Compare multiple runs

Beyond the basics

Try it yourself

Want to try this yourself?