bitmap
Here is a simple example to illustrate the usage of several aggregate functions in Bitmap. For detailed function definitions or more Bitmap functions, see bitmap-functions.
Create tableβ
The aggregation model is needed when creating table. The data type is bitmap and the aggregation function is bitmap_union.
CREATE TABLE `pv_bitmap` (
  `dt` int(11) NULL COMMENT "",
  `page` varchar(10) NULL COMMENT "",
  `user_id` bitmap BITMAP_UNION NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`dt`, `page`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`dt`);
Note: With an amount of data, you'd better create a rollup table corresponding to high-frequent bitmap_union.
ALTER TABLE pv_bitmap ADD ROLLUP pv (page, user_id);
Data Loadβ
TO_BITMAP (expr): Convert 0 ~ 18446744073709551615 unsigned bigint to bitmap
BITMAP_EMPTY (): Generate empty bitmap columns, used for the default value to be filled in when inserting or inputting
BITMAP_HASH (expr): Convert columns of any type to a bitmap by 32-bit hashing
BITMAP_HASH64 (expr): Convert columns of any type to a bitmap by 64-bit hashing
Stream Loadβ
When inputting data using Stream Load, you can convert the data to a BItmap field as follows:
cat data | curl --location-trusted -u user:passwd -T - \
    -H "columns: dt,page,user_id, user_id=to_bitmap(user_id)" \
    http://host:8410/api/test/testDb/_stream_load
cat data | curl --location-trusted -u user:passwd -T - \
    -H "columns: dt,page,user_id, user_id=bitmap_hash(user_id)" \
    http://host:8410/api/test/testDb/_stream_load
cat data | curl --location-trusted -u user:passwd -T - \
    -H "columns: dt,page,user_id, user_id=bitmap_hash64(user_id)" \
    http://host:8410/api/test/testDb/_stream_load
cat data | curl --location-trusted -u user:passwd -T - \
    -H "columns: dt,page,user_id, user_id=bitmap_empty()" \
    http://host:8410/api/test/testDb/_stream_load
Insert Intoβ
When inputting data using Insert Into, you need to select the corresponding mode based on the type of columns in the source table.
- id2's column type in source table is bitmap
insert into bitmap_table1
select id, id2 from bitmap_table2;
- id2's column type in target table is bitmap
insert into bitmap_table1 (id, id2)
values (1001, to_bitmap(1000))
, (1001, to_bitmap(2000));
- id2's column type in source table is bitmap, and is the result of aggregation using bit_map_union().
insert into bitmap_table1
select id, bitmap_union(id2) from bitmap_table2 group by id;
- id2's column type in source table is INT, and the bitmap type is generated by to_bitmap().
insert into bitmap_table1
select id, to_bitmap(id2) from table;
- id2's column type in source table is STRING, and the bitmap type is generated by bitmap_hash().
insert into bitmap_table1
select id, bitmap_hash(id2) from table;
- id2's column type in source table is STRING, and the bitmap type is generated by bitmap_hash64().
insert into bitmap_table1
select id, bitmap_hash64(id2) from table;
Data Queryβ
Syntaxβ
``BITMAP_UNION (expr)`: Calculate the union of the input Bitmaps, and returns the new Bitmap.
BITMAP_UNION_COUNT (expr): Calculate the union of the input Bitmaps, and returns its cardinality, equivalent to BITMAP_COUNT (BITMAP_UNION (expr)). It is recommended to use the BITMAP_UNION_COUNT function first, for its performance is better than BITMAP_COUNT (BITMAP_UNION (expr)).
BITMAP_UNION_INT (expr): Calculate the number of different values in columns of type TINYINT, SMALLINT and INT, return the value same as COUNT (DISTINCT expr).
INTERSECT_COUNT (bitmap_column_to_count, filter_column, filter_values ...): Calculate the cardinality of the intersection of multiple bitmaps that satisfy filter_column condition. bitmap_column_to_count is a column of type bitmap, filter_column is a column of varying dimensions, and filter_values is a list of dimension values.
BITMAP_INTERSECT(expr): Calculate the intersection of this group of bitmap values and returns a new bitmap.
Exampleβ
The following SQL uses the pv_bitmap table above as an example:
Calculate the deduplicated value for user_id:
select bitmap_union_count(user_id)
from pv_bitmap;
select bitmap_count(bitmap_union(user_id))
from pv_bitmap;
Calculate the deduplicated value of id:
select bitmap_union_int(id)
from pv_bitmap;
Calculate the retention of user_id:
select intersect_count(user_id, page, 'game') as game_uv,
    intersect_count(user_id, page, 'shopping') as shopping_uv,
    intersect_count(user_id, page, 'game', 'shopping') as retention -- Number of users that access both the 'game' and 'shopping' pages
from pv_bitmap
where page in ('game', 'shopping');
keywordβ
BITMAP,BITMAP_COUNT,BITMAP_EMPTY,BITMAP_UNION,BITMAP_UNION_INT,TO_BITMAP,BITMAP_UNION_COUNT,INTERSECT_COUNT,BITMAP_INTERSECT