DataFrame Class
The DataFrame class is the core data structure in Molniya. It represents a collection of data organized into named columns.
Overview
class DataFrame<T = Record<string, unknown>> {
// Properties
readonly schema: Schema;
readonly columnNames: string[];
// Methods
filter(expr: Expr): DataFrame<T>;
select(...columns: string[]): DataFrame<Pick<T, K>>;
// ... and more
}The generic type parameter T represents the row type and enables TypeScript type inference through transformations.
Properties
schema
Returns the current schema of the DataFrame.
const df = await readCsv("data.csv", schema);
console.log(df.schema);
// { columns: [...], columnMap: Map, rowSize: 24, columnCount: 4 }columnNames
Returns an array of column names.
const df = await readCsv("data.csv", schema);
console.log(df.columnNames);
// ['id', 'name', 'age', 'salary']Static Methods
DataFrame.empty()
Creates an empty DataFrame with a given schema.
import { DataFrame, createSchema, unwrap, DType } from "molniya";
const schema = unwrap(createSchema({
id: DType.int32,
name: DType.string
}));
const empty = DataFrame.empty(schema, null);DataFrame.fromChunks()
Creates a DataFrame from existing chunks (advanced use).
import { DataFrame } from "molniya";
const df = DataFrame.fromChunks(chunks, schema, dictionary);DataFrame.fromStream()
Creates a DataFrame from an async iterable of chunks (advanced use).
import { DataFrame } from "molniya";
const df = DataFrame.fromStream(asyncIterable, schema, dictionary);Transformation Methods
These methods return a new DataFrame with the transformation applied. They use lazy evaluation - no data is processed until an action is called.
filter()
Select rows matching a condition.
filter(expr: Expr): DataFrame<T>Example:
df.filter(col("age").gte(18))
df.filter(and(col("status").eq("active"), col("balance").gt(0)))where()
Alias for filter().
where(expr: Expr): DataFrame<T>select()
Select specific columns.
select<K extends keyof T>(...columns: (K & string)[]): DataFrame<Pick<T, K>>Example:
df.select("id", "name")
df.select("id") // TypeScript knows this returns DataFrame<{id: number}>drop()
Drop specific columns (keeps all others).
drop<K extends keyof T>(...columns: (K & string)[]): DataFrame<Omit<T, K>>Example:
df.drop("temp_column", "internal_id")rename()
Rename columns.
rename(mapping: Partial<Record<keyof T, string>>): DataFrame<Record<string, unknown>>Example:
df.rename({ firstName: "first_name", lastName: "last_name" })withColumn()
Add a computed column.
withColumn<K extends string>(name: K, expr: Expr): DataFrame<T & Record<K, unknown>>Example:
df.withColumn("full_name", col("first").add(" ").add(col("last")))
df.withColumn("discounted", col("price").mul(0.9))withColumns()
Add multiple computed columns.
withColumns(columns: ComputedColumn[] | Record<string, Expr>): DataFrameExample:
df.withColumns([
{ name: "tax", expr: col("amount").mul(0.1) },
{ name: "total", expr: col("amount").add(col("tax")) }
])
// Or with object syntax:
df.withColumns({
tax: col("amount").mul(0.1),
total: col("amount").add(col("tax"))
})cast()
Cast a column to a different type.
cast(column: keyof T, targetDType: DType): DataFrame<T>Example:
df.cast("id", DType.int64)
df.cast("price", DType.float64)fillNull()
Replace null values in a column.
fillNull(column: keyof T, fillValue: number | bigint | string | boolean): DataFrame<T>Example:
df.fillNull("email", "[email protected]")
df.fillNull("age", 0)dropNull()
Remove rows with null values.
dropNull(columns?: keyof T | (keyof T)[]): DataFrame<T>Example:
df.dropNull() // Drop rows with any null
df.dropNull("email") // Drop rows where email is null
df.dropNull(["firstName", "lastName"]) // Drop rows where either is nullSorting Methods
sort()
Sort by one or more columns.
sort(keys: string | string[] | SortKey[]): DataFrame<T>Example:
df.sort("name") // Ascending
df.sort(desc("amount")) // Descending
df.sort(["lastName", "firstName"]) // Multiple columns
df.sort([{ column: "amount", descending: true }]) // Explicit syntaxorderBy()
Alias for sort().
orderBy(keys: string | string[] | SortKey[]): DataFrame<T>Limiting Methods
limit()
Limit the number of rows.
limit(n: number): DataFrame<T>Example:
df.limit(100) // First 100 rowshead()
Alias for limit() with default of 5.
head(n?: number): DataFrame<T>Example:
df.head() // First 5 rows
df.head(10) // First 10 rowsslice()
Skip rows and then limit.
slice(start: number, count: number): DataFrame<T>Example:
df.slice(100, 50) // Skip 100, take 50 (rows 101-150)Aggregation Methods
groupBy()
Group rows by key columns for aggregation.
groupBy(keyColumns: string | string[]): RelationalGroupedDataset<T>Example:
const grouped = df.groupBy("category")
const result = grouped.agg([
{ name: "total", expr: sum("amount") }
])See GroupBy for more details.
agg()
Aggregate without grouping (produces single row).
agg(specs: TypedAggSpec[]): DataFrameExample:
const totals = await df.agg([
{ name: "total_sales", expr: sum("amount") },
{ name: "avg_price", expr: avg("price") },
{ name: "count", expr: count() }
]).collect();min() / max() / mean()
Shortcut aggregation methods on DataFrame.
min(column: keyof T): Promise<number | string | null>
max(column: keyof T): Promise<number | string | null>
mean(column: keyof T): Promise<number | null>Example:
const minAge = await df.min("age")
const maxAge = await df.max("age")
const avgSalary = await df.mean("salary")Join Methods
innerJoin()
Inner join with another DataFrame.
innerJoin<U>(other: DataFrame<U>, leftOn: keyof T, rightOn?: keyof U, suffix?: string): Promise<DataFrame<T & U>>Example:
const result = await orders.innerJoin(customers, "customerId", "id")leftJoin()
Left join with another DataFrame.
leftJoin<U>(other: DataFrame<U>, leftOn: keyof T, rightOn?: keyof U, suffix?: string): Promise<DataFrame<T & U>>semiJoin()
Semi join (returns rows from left where match exists in right).
semiJoin<U>(other: DataFrame<U>, on: keyof T | (keyof T)[]): Promise<DataFrame<T>>antiJoin()
Anti join (returns rows from left where no match exists in right).
antiJoin<U>(other: DataFrame<U>, on: keyof T | (keyof T)[]): Promise<DataFrame<T>>crossJoin()
Cross join (Cartesian product).
crossJoin<U>(other: DataFrame<U>, suffix?: string): Promise<DataFrame<T & U>>Execution Methods
See Execution Methods for details on collect(), show(), toArray(), etc.
Inspection Methods
See Inspection Methods for details on printSchema(), explain(), etc.