Introduction

Real‑time analytics on SAP HANA thrives on two pillars: speed and efficiency. While the in‑memory architecture delivers micro‑second query response times, the underlying column store compression determines how much data can be kept in memory, how quickly it can be scanned, and ultimately how scalable your analytics platform becomes.

In this article we’ll dive deep into the mechanics of SAP HANA’s column‑store compression, uncover common pitfalls, and walk through actionable steps you can apply today to squeeze every last byte of storage efficiency without sacrificing query performance. Whether you’re a data modeler, a DBA, or an ABAP developer responsible for data provisioning, the techniques below will help you keep the column store lean, hot, and ready for the next wave of real‑time insights.

Understanding SAP HANA Column Store Basics

Columnar vs. Row‑Based Storage

SAP HANA stores tables in two physical formats:

Format	Ideal Use‑Case	Compression Potential
Row Store	Transactional (OLTP) workloads, frequent updates	Limited (mostly dictionary)
Column Store	Analytical (OLAP) workloads, bulk reads	High (multiple algorithms)

In the column store, each column is stored contiguously, enabling vectorized processing and massively parallel scans. Because values of the same column often exhibit redundancy (e.g., repeated status codes, timestamps with small deltas), HANA can apply a suite of compression algorithms that dramatically reduce the memory footprint.

Compression Fundamentals

At its core, HANA’s column store applies a pipeline of compression techniques:

Dictionary Encoding – maps distinct values to integer keys.
Run‑Length Encoding (RLE) – collapses consecutive identical keys.
Delta Encoding – stores the difference between successive values.
Prefix/Suffix Compression – shares common string prefixes or suffixes.
Bit‑Packing – packs integer keys into the smallest possible bit width.

The engine automatically selects the best combination per column, but table design and data‑type choices can tip the scales in favor of more aggressive compression.

Why Compression Matters for Real‑Time Analytics

Memory Economy – HANA’s pricing is memory‑driven. A 30 % reduction in column size can translate to tangible cost savings.
Cache‑Friendly Scans – Smaller columns fit better into CPU caches, reducing latency for analytic queries.
I/O Reduction – Even though HANA is in‑memory, data is persisted to disk for durability. Less data to write/read = faster recovery and backup.
Parallelism Gains – With fewer bytes to process, each worker thread can finish its slice quicker, improving overall throughput.

In short, compression is not a nice‑to‑have; it’s a performance enabler for any real‑time analytics scenario.

Core Compression Techniques in HANA

Dictionary Encoding

When a column contains a limited set of distinct values (e.g., country codes, status flags), HANA builds a dictionary mapping each unique value to a small integer key. The column itself stores only these keys.

-- Example: Inspect dictionary size for a column
SELECT TABLE_NAME, COLUMN_NAME, DICTIONARY_SIZE
FROM   M_CS_COLUMNS
WHERE  TABLE_NAME = 'SALES_ORDER' AND COLUMN_NAME = 'ORDER_STATUS';

A well‑chosen VARCHAR length or a CHAR(3) for country codes can shrink the dictionary dramatically.

Run‑Length Encoding (RLE)

If a column’s dictionary keys repeat consecutively (common in time‑series data where a status holds for many rows), HANA stores the value once together with a run length.

-- Demonstrate RLE impact via compression ratio
SELECT TABLE_NAME, COLUMN_NAME,
       COMPRESSED_SIZE / RAW_SIZE AS "RLE_RATIO"
FROM   M_CS_COLUMNS
WHERE  TABLE_NAME = 'SENSOR_READINGS' AND COLUMN_NAME = 'STATUS';

Delta Encoding

Numeric columns that increase monotonically (e.g., timestamps, IDs) are stored as the difference from the previous value. Small deltas compress well with bit‑packing.

-- Show delta compression for a monotonically increasing column
SELECT TABLE_NAME, COLUMN_NAME,
       AVG(DELTA) AS "AVG_DELTA",
       COMPRESSED_SIZE / RAW_SIZE AS "DELTA_RATIO"
FROM   M_CS_PARTITIONS
WHERE  TABLE_NAME = 'EVENT_LOG' AND COLUMN_NAME = 'EVENT_TIME';

Prefix/Suffix Compression

Long strings often share common prefixes (e.g., product codes PROD-2023-001). HANA extracts the shared part, storing only the unique suffix per row.

Assessing Current Compression Health

Before you start tweaking, you need a baseline. SAP HANA provides a set of system views that surface compression statistics at the table, column, and partition level.

Key System Views

View	Purpose
`M_CS_TABLES`	High‑level table compression metrics (raw vs. compressed size).
`M_CS_COLUMNS`	Per‑column compression details, algorithm used, dictionary size.
`M_CS_PARTITIONS`	Partition‑specific stats – useful when tables are horizontally partitioned.

Sample Diagnostic Query

/* Overall compression health for all column‑store tables */
SELECT t.SCHEMA_NAME,
       t.TABLE_NAME,
       t.ROW_COUNT,
       t.RAW_SIZE/1024/1024 AS RAW_MB,
       t.COMPRESSED_SIZE/1024/1024 AS COMPRESSED_MB,
       ROUND(100 * (1 - t.COMPRESSED_SIZE/t.RAW_SIZE), 1) AS "COMPRESSION_%"
FROM   M_CS_TABLES t
WHERE  t.COMPRESSED_SIZE > 0
ORDER  BY "COMPRESSION_%" DESC
LIMIT  20;

The top‑20 tables with the highest compression percentages are prime candidates for re‑evaluation—they often indicate that the chosen data types are already optimal, but you may discover opportunities to increase compression further (e.g., by adding partitions).

Actionable Optimization Strategies

Below are concrete steps you can take, grouped by impact level. Each recommendation includes a code snippet or configuration tip you can apply immediately.

1. Choose the Right Data Types

Column Type	Recommended Choice	Reason
Integer IDs	`INTEGER` → `BIGINT` only if >2 B rows	Smaller integer fits tighter bit‑packing.
Fixed‑Length Text	`CHAR(n)` if length is constant	Enables prefix compression.
Variable Text	`NVARCHAR` with a tight maximum	Reduces dictionary overhead.
Dates/Times	`DATE`, `TIMESTAMP` (no timezone)	Delta encoding works best on pure numeric representation.

ABAP Example – Defining a CDS View with Optimal Types

@AbapCatalog.sqlViewName: 'ZV_SALES_ORDER'
@AbapCatalog.compiler.compareFilter: true
define view Z_C_SalesOrder
  as select from sflight
{
    key carrid    as CarrierID,          // CHAR(3) – fits dictionary
    key connid    as ConnectionID,       // INTEGER – delta friendly
    key fldate    as FlightDate,        // DATE
    price         as Price,             // DECIMAL(10,2)
    seatsmax      as SeatsMax,          // SMALLINT
    seatsocc      as SeatsOccupied
}

2. Leverage Horizontal Partitioning

Partitioning a massive fact table by a high‑cardinality, time‑based column (e.g., YEAR) not only improves query pruning but also creates independent compression domains. Each partition can develop its own dictionary, often yielding a higher overall compression ratio.

-- Create a partitioned table (monthly partitions)
CREATE COLUMN TABLE SALES_FACT (
    SALES_ID      BIGINT,
    SALES_DATE    DATE,
    REGION        NVARCHAR(20),
    PRODUCT_ID    BIGINT,
    QUANTITY      INTEGER,
    REVENUE       DECIMAL(15,2)
)
PARTITION BY RANGE (SALES_DATE) (
    PARTITION p_2022_01 VALUES LESS THAN ('2022-02-01'),
    PARTITION p_2022_02 VALUES LESS THAN ('2022-03-01'),
    PARTITION p_2022_03 VALUES LESS THAN ('2022-04-01')
    -- … continue for each month …
);

After loading data, re‑compress each partition individually to let the engine re‑evaluate dictionaries.

ALTER TABLE SALES_FACT REBUILD PARTITION p_2022_01 COMPRESS;

3. Adjust Table Design for Compression‑Friendliness

Avoid unnecessary wide VARCHAR columns; trim them to the smallest possible length.
Normalize high‑cardinality strings into surrogate keys (e.g., CustomerName → CustomerID). The surrogate key benefits from dictionary + RLE, while the original string can be stored in a separate dimension table.
Remove redundant columns that are only used for reporting but not for analytics (consider materialized views instead).

4. Use Compression‑Friendly Load Patterns

When loading bulk data via SAP HANA Smart Data Integration (SDI), SLT, or ABAP ODP, configure the pipeline to bulk‑load rather than row‑by‑row inserts. Bulk loading allows the engine to build dictionaries once per batch, which is far more efficient than constantly updating them.

JSON Configuration for SDI (Bulk Load)

{
  "source": {
    "type": "jdbc",
    "connection": "oracle_source",
    "query": "SELECT * FROM ORDERS"
  },
  "target": {
    "type": "hana",
    "table": "ORDERS_FACT",
    "bulkLoad": true,
    "batchSize": 50000
  },
  "options": {
    "compression": "auto",
    "rebuildDictionary": true
  }
}

5. Tweak Automatic Compression Settings

SAP HANA provides a global compression parameter that influences the aggressiveness of the dictionary size limit and RLE thresholds.

-- Show current compression parameters
SELECT * FROM SYS.M_INIFILE_CONTENTS
WHERE FILE_NAME = 'global.ini' AND SECTION = 'compression';

-- Example: Increase dictionary size limit for VARCHAR columns
INSERT INTO SYS.M_INIFILE_CONTENTS (FILE_NAME, SECTION, KEY, VALUE)
VALUES ('global.ini', 'compression', 'DICTIONARY_MAX_SIZE', '8192');

Caution: Changing global.ini affects the entire system. Test in a non‑production environment first and monitor memory usage closely.

6. Manual Recompression for Stagnant Tables

If a table has undergone heavy updates or deletions, its compression may degrade over time. Use the REBUILD statement to force a fresh compression pass.

-- Full table recompression (offline)
ALTER TABLE SALES_FACT REBUILD COMPRESS;

-- Online recompression (minimal lock)
ALTER TABLE SALES_FACT REBUILD COMPRESS ONLINE;

Advanced Tuning

Compression on Row Store vs. Column Store

While the column store is the default for analytical workloads, some use cases (e.g., write‑intensive master data) still rely on the row store. HANA applies dictionary compression to row‑store tables as well, but the gains are typically modest. If you must keep a table in the row store, consider:

Converting it to a temporary column store for bulk loads.
Using COLUMN STORE hint in the CREATE TABLE statement.

CREATE ROW TABLE EMPLOYEE_MASTER (
    EMP_ID      BIGINT,
    FIRST_NAME  NVARCHAR(30),
    LAST_NAME   NVARCHAR(30),
    DEPT_ID     INTEGER
) WITH DEFAULT PARTITIONING;

SAP HANA Native Storage Extension (NSE) with Compression

NSE enables you to store data on high‑performance SSDs while still leveraging HANA’s compression engine. When using NSE, you can set compression policies per file system.

{
  "nse": {
    "mountPoint": "/hana/nse",
    "compression": {
      "enabled": true,
      "algorithm": "LZ4",
      "dictionarySize": "16KB"
    }
  }
}

Integration with ABAP: ODP and SLT

When provisioning data from SAP ECC or S/4HANA to HANA, you can embed compression hints directly in the ODP replication definition.

DATA: lo_replication TYPE REF TO /bobf/if_odb_replication.
lo_replication->set_option( iv_key = 'COMPRESSION' iv_value = 'TRUE' ).
lo_replication->set_option( iv_key = 'DICTIONARY_MAX_SIZE' iv_value = '4096' ).

The downstream SLT replication will honor these settings, ensuring that the initial load is already compressed optimally.

ABAP Example – Pushing Data with Compression Flags

REPORT zpush_sales_data.

DATA: lt_sales TYPE TABLE OF zs_sales_raw,
      lo_conn  TYPE REF TO cl_sql_connection,
      lo_stmt  TYPE REF TO cl_sql_statement.

SELECT * FROM zs_sales_raw INTO TABLE lt_sales UP TO 100000 ROWS.
lo_conn = cl_sql_connection=>get_connection( iv_db = 'HANA' ).
lo_stmt = lo_conn->create_statement( ).

" Use the HANA specific hint to enable bulk compression
lo_stmt->execute_update( |INSERT /*+ COMPRESS */ INTO SALES_FACT
                         SELECT * FROM :lt_sales| ).
COMMIT WORK.

The /*+ COMPRESS */ hint tells the optimizer to treat the incoming batch as a compression‑aware bulk load, reducing dictionary rebuild overhead.

Monitoring and Continuous Improvement

SAP HANA Cockpit Dashboards

Memory Overview – shows column‑store memory vs. row‑store memory.
Compression Ratio – visualizes compression per schema.
Hot Tables – identifies tables with high read/write activity; prioritize them for recompression.

Setting Alerts

-- Alert when compression ratio falls below 30 %
INSERT INTO SYS.M_ALERTS (ALERT_NAME, CONDITION, SEVERITY, ACTION)
VALUES (
  'LowCompression_SalesFact',
  'SELECT COMPRESSED_SIZE/RAW_SIZE FROM M_CS_TABLES WHERE TABLE_NAME = ''SALES_FACT'' < 0.30',
  'HIGH',
  'CALL REBUILD_TABLE_COMPRESSION(''SALES_FACT'')'
);

Periodic Recompression Schedule

A typical maintenance window could include:

Frequency	Action
Daily	Run `M_CS_TABLES` health check; flag tables with compression drop > 5 %.
Weekly	Rebuild partitions that exceed a defined row‑change threshold (e.g., > 10 %).
Monthly	Full‑table recompression for static dimension tables.
Quarterly	Review data‑type choices and partitioning strategy; adjust `global.ini` if needed.

Automation can be achieved via SQLScript procedures combined with SAP HANA XS or SAP Cloud Platform jobs.

CREATE PROCEDURE REBUILD_TABLE_COMPRESSION (IN iv_table_name NVARCHAR(128))
LANGUAGE SQLSCRIPT
AS
BEGIN
  DECLARE sql_stmt NVARCHAR(500);
  sql_stmt := ||'ALTER TABLE '|| iv_table_name ||' REBUILD COMPRESS ONLINE;';
  EXECUTE IMMEDIATE :sql_stmt;
END;

Key Takeaways

Compression is a performance lever, not just a storage saver; it directly impacts query latency and memory cost.
Data‑type discipline (tight VARCHAR lengths, appropriate numeric types) yields the biggest gains.
Partitioning creates isolated dictionaries, often increasing overall compression by 10‑20 %.
Use bulk‑load pathways (SDI, SLT, ODP) with compression hints to avoid incremental dictionary churn.
Regularly monitor compression health via system views (M_CS_TABLES, M_CS_COLUMNS) and set proactive alerts.
Recompression (online or offline) should be part of a scheduled maintenance plan, especially after heavy DML activity.
Advanced features like NSE and custom global.ini settings can push compression further, but require careful testing.

Future Outlook

The next generation of SAP HANA is already embracing AI‑driven adaptive compression. Early prototypes use machine‑learning models to predict the optimal combination of dictionary size, RLE thresholds, and delta encoding per column based on data‑distribution trends

Optimizing SAP HANA Column Store Compression for Real-Time Analytics