Dear all,
I would like to know, if HANA can use streaming algorithms to determine results of top-N queries.
suppose table_a and table_b:
create column table table_a (
id bigint not null primary key generated by default as identity,
val_a numeric
);
create column table table_b (
id bigint not null primary key generated by default as identity,
val_b numeric
);
My query goes like this:
select top 10 * from (
select a.id, b.id, val_a*val_b as similarity
from table_a a
, table_b b
order by val_a * val_b desc
)
where table_a and table_b are large tables, so that the cartesian join will not fit in the memory at hand (that means, a couple of millions each). .
This query works for small numbers of rows, but seems to materialize the complete result set, although this can be achieved with constant memory using a streaming algorithm.
Does HANA have streaming algorithms for this problem available?
Is there a way to tell HANA to use these streaming algorithms?
Can one use this to find the top-N matching elements in b for each element in a? For example by using rank() over (partition by a.id order by val_a*val_b desc)
Thanks in advance,
Jens