Aggregation Functions API
API reference for aggregation functions used in groupBy and agg operations.
Overview
Aggregation functions collapse multiple rows into a single value. They are typically used within groupBy() or agg() calls.
Basic Aggregations
sum()
Calculate the sum of a numeric column.
sum(column: string | ColumnRef): AggregationExprParameters:
column- Column name or reference to sum
Returns: Aggregation expression
Example:
import { sum, col } from "molniya";
// Sum of column
sum("amount")
// Sum of expression
sum(col("price").mul(col("quantity")))
// In groupBy
df.groupBy("category", [
{ name: "total", expr: sum("amount") }
])Null Handling: Null values are ignored in the sum.
avg()
Calculate the average (arithmetic mean) of a numeric column.
avg(column: string | ColumnRef): AggregationExprAlias: mean()
Example:
import { avg } from "molniya";
avg("score")
avg(col("total"))Null Handling: Null values are ignored in the average.
min()
Find the minimum value in a column.
min(column: string | ColumnRef): AggregationExprExample:
import { min } from "molniya";
min("price")
min("date") // Works with dates tooNull Handling: Null values are ignored.
max()
Find the maximum value in a column.
max(column: string | ColumnRef): AggregationExprExample:
import { max } from "molniya";
max("price")
max("date")Null Handling: Null values are ignored.
count()
Count rows or non-null values.
count(): AggregationExpr // Count all rows
count(column: string | ColumnRef): AggregationExpr // Count non-null valuesExample:
import { count } from "molniya";
// Count all rows
count()
// Count non-null values in column
count("email")
// In groupBy
df.groupBy("category", [
{ name: "total_rows", expr: count() },
{ name: "with_email", expr: count("email") }
])Null Handling:
count()counts all rows including nullscount(column)counts only non-null values
Statistical Aggregations
std()
Calculate the standard deviation.
std(column: string | ColumnRef): AggregationExprExample:
import { std } from "molniya";
std("value")Null Handling: Null values are ignored.
var()
Calculate the variance.
var(column: string | ColumnRef): AggregationExprExample:
import { var as variance } from "molniya";
variance("value")Null Handling: Null values are ignored.
median()
Calculate the median value.
median(column: string | ColumnRef): AggregationExprExample:
import { median } from "molniya";
median("income")Null Handling: Null values are ignored.
Position Aggregations
first()
Get the first value in the group.
first(column: string | ColumnRef): AggregationExprExample:
import { first } from "molniya";
first("name")
first(col("created_at"))Ordering
The "first" value depends on the current row ordering. Use sort() before groupBy() for predictable results.
last()
Get the last value in the group.
last(column: string | ColumnRef): AggregationExprExample:
import { last } from "molniya";
last("status")
last(col("updated_at"))Distinct Aggregations
countDistinct()
Count distinct/unique values.
countDistinct(column: string | ColumnRef): AggregationExprExample:
import { countDistinct } from "molniya";
df.groupBy("region", [
{ name: "unique_customers", expr: countDistinct("customer_id") },
{ name: "total_orders", expr: count() }
])Aggregation with DataFrame.agg()
Apply aggregations to the entire DataFrame:
import { sum, avg, min, max, count } from "molniya";
// Single aggregation
const total = await df.agg(sum("amount"));
// Multiple aggregations
const stats = await df.agg([
sum("amount"),
avg("amount"),
min("amount"),
max("amount"),
count()
]);Conditional Aggregations
Filter aggregations to specific rows:
import { sum, count, col } from "molniya";
df.groupBy("category", [
{
name: "high_value_total",
expr: sum("amount").filter(col("amount").gt(1000))
},
{
name: "high_value_count",
expr: count().filter(col("amount").gt(1000))
}
])Aggregating Expressions
Use expressions within aggregations:
import { sum, avg, col } from "molniya";
df.groupBy("category", [
{ name: "total_revenue", expr: sum(col("price").mul(col("quantity"))) },
{ name: "avg_discount_pct", expr: avg(col("discount").mul(100)) }
])Combining Aggregations
Multiple aggregations in a single groupBy:
import { sum, avg, min, max, count, first, last } from "molniya";
df.groupBy("department", [
{ name: "total_salary", expr: sum("salary") },
{ name: "avg_salary", expr: avg("salary") },
{ name: "min_salary", expr: min("salary") },
{ name: "max_salary", expr: max("salary") },
{ name: "employee_count", expr: count() },
{ name: "first_hire", expr: first("hire_date") },
{ name: "latest_hire", expr: last("hire_date") }
])Performance Notes
- Aggregations are computed in a single pass when possible
- Null handling adds minimal overhead
- Statistical functions (std, var, median) require more computation
- Count distinct uses a hash set and may use significant memory for high cardinality
Type Safety
Aggregation functions preserve type information:
// sum of int32 returns int64 (to prevent overflow)
// avg of numeric returns float64
// min/max preserve input type
// count always returns int64