Building Expressions
Expressions are the building blocks of data transformations in Molniya. They represent computations that are evaluated lazily when you execute a DataFrame action.
What is an Expression?
An expression (Expr) is a description of a computation, not the result itself. Expressions are combined to build transformation pipelines:
import { col, lit } from "molniya";
// These are expressions - descriptions of computations
const priceExpr = col("price"); // Reference to "price" column
const taxExpr = lit(0.1); // Literal value 0.1
const totalExpr = priceExpr.mul(1.1); // Arithmetic expression
// Nothing is computed yet - we're just building the planExpression Types
Column References
Reference existing columns using col():
import { col } from "molniya";
col("id") // Simple reference
col("price") // Numeric column
col("name") // String columnLiterals
Create constant values using lit():
import { lit } from "molniya";
lit(100) // Number literal
lit("active") // String literal
lit(true) // Boolean literal
lit(null) // Null literal
lit(123456789n) // BigInt literalLiterals are useful for comparisons and calculations:
df.filter(col("status").eq(lit("active")))
.withColumn("tax_rate", lit(0.08))Arithmetic Expressions
Basic Operations
import { col, add, sub, mul, div, mod, neg } from "molniya";
// Using method chaining
df.withColumn("total", col("price").mul(col("quantity")))
.withColumn("discount", col("total").mul(0.1))
.withColumn("final", col("total").sub(col("discount")))
// Using operator functions
df.withColumn("avg", div(col("sum"), col("count")))
.withColumn("remainder", mod(col("value"), lit(10)))
.withColumn("negative", neg(col("value")))Operator Reference
| Operator | Method | Description |
|---|---|---|
+ | add() | Addition |
- | sub() | Subtraction |
* | mul() | Multiplication |
/ | div() | Division |
% | mod() | Modulo |
- | neg() | Negation |
Comparison Expressions
Comparison Operators
import { col } from "molniya";
// Equal
col("status").eq("active")
// Not equal
col("type").neq("deleted")
// Greater than
col("age").gt(18)
// Greater than or equal
col("score").gte(60)
// Less than
col("price").lt(100)
// Less than or equal
col("quantity").lte(10)Null Checks
// Is null
col("email").isNull()
// Is not null
col("email").isNotNull()Logical Expressions
Combine conditions with logical operators:
import { and, or, not } from "molniya";
// AND - all conditions must be true
and(
col("age").gte(18),
col("status").eq("active")
)
// OR - any condition can be true
or(
col("category").eq("electronics"),
col("category").eq("computers")
)
// NOT - negate a condition
not(col("deleted").eq(true))
// Combining logical operators
and(
col("age").gte(18),
or(
col("country").eq("US"),
col("country").eq("CA")
)
)String Expressions
import { col, length, substring, contains, upper, lower } from "molniya";
// String length
col("name").length()
// Substring
col("phone").substring(0, 3)
// Contains
col("email").contains("@")
// Case conversion
col("name").upper()
col("email").lower()
// Concatenation
col("first").add(" ").add(col("last"))Date/Time Expressions
import { col, year, month, day, addDays, diffDays } from "molniya";
// Extract components
year(col("order_date"))
month(col("order_date"))
day(col("order_date"))
// Date arithmetic
addDays(col("date"), 7)
subDays(col("date"), 30)
diffDays(col("end"), col("start"))Conditional Expressions
When/Then/Otherwise
Create conditional logic:
import { when, col } from "molniya";
// Simple if-then-else
when(col("score").gte(90), "A")
.when(col("score").gte(80), "B")
.when(col("score").gte(70), "C")
.otherwise("F")
// Used in withColumn
df.withColumn("grade",
when(col("score").gte(90), "A")
.when(col("score").gte(80), "B")
.otherwise("C")
)Expression Composition
Expressions can be arbitrarily complex:
import { col, lit, and, or } from "molniya";
// Complex calculation
df.withColumn("discounted_price",
col("price")
.mul(lit(1).sub(col("discount_rate")))
.add(col("tax"))
)
// Complex filter
df.filter(and(
or(
col("category").eq("electronics"),
col("category").eq("computers")
),
col("price").gte(100),
col("in_stock").eq(true)
))Type Casting in Expressions
Convert between types within expressions:
import { col, cast, DType } from "molniya";
// Cast to different type
df.withColumn("price_int", cast(col("price"), DType.int32))
.withColumn("id_string", cast(col("id"), DType.string))Expression Evaluation Context
Expressions are evaluated in different contexts:
Row Context
Most expressions are evaluated once per row:
// Evaluated for each row
df.withColumn("doubled", col("value").mul(2))Aggregation Context
Aggregation expressions collapse multiple rows into one value:
import { sum, avg, count } from "molniya";
// Evaluated across groups
df.groupBy("category", [
{ name: "total", expr: sum("amount") },
{ name: "average", expr: avg("price") },
{ name: "count", expr: count() }
])Expression Optimization
Molniya optimizes expressions before execution:
- Constant folding:
lit(2).add(lit(3))→lit(5) - Predicate pushdown: Filters applied as early as possible
- Common subexpression elimination: Duplicate expressions computed once
Best Practices
- Use
col()for columns: It's type-safe and enables autocompletion - Use
lit()for constants: Makes intent clear - Break complex expressions: Use multiple
withColumncalls for readability - Leverage method chaining: Build expressions fluently
- Consider null handling: Use
isNull()/isNotNull()explicitly
Common Expression Patterns
Percentage Calculation
df.withColumn("percentage",
col("part").div(col("whole")).mul(100)
)Running Total (within groups)
df.groupBy("category", [
{ name: "running_total", expr: sum("amount") }
])Conditional Aggregation
df.groupBy("region", [
{ name: "high_value_count", expr: count().filter(col("amount").gt(1000)) }
])Date Bucketing
df.withColumn("month_bucket",
year(col("date")).mul(100).add(month(col("date")))
)