DAS is used to do analytics based on the published event data. After the analytics done these summarised data may need other external components like APIM dashboard etc. In such a case we face how to retrieve these data from the DAS to outside. One option we have is saving summarized data to external RDBMS as APIM 1.9.x used. But the next interesting option we have is using DAS REST API. REST API can be used to retrieve data from DAS internal tables.
But sometime developer may prefer extracting more specific data from the summarised data. In that case, we have to do some search. DAS rest API provide Apache Lucene based searching capabilities to the DAS tables.
As developers, we may be interesting on more specific and complex search queries on the RDBMS. But Lucene search queries are very limited and you cannot achieve SQL like data retrieval. But using Lucene aggregates API we can make data retrieval more specifically.
This blog is to explain how to use the das REST API for aggregate functions.
{ "query": "max_request_time: [1442860200000 TO 1445538600000]", "aggregateLevel": 3, "tableName": "API_FAULT_SUMMARY", "groupByField": "api_version_apiPublisher_context_facet", "aggregateFields": [ { "fieldName": "total_fault_count", "aggregate": "SUM", "alias": "totalFaultCount" } ] }
Its column should be indexed : that is for Lucene support SHould have facet attribute for aggregate functions Create facet attribute composing multiple attributes.
ex: say we need an API table having a name,request_count, version,user_count, publisher attributes. also, we need to find out the request_count and user_count grouping by the name, version, and publisher. So that we need to create extra facet attribute for a name, version, and publisher as a combined column.
ex:
CREATE TEMPORARY TABLE API_SUMMARY USING CarbonAnalytics OPTIONS (tableName "API_SUMMARY", schema "name string -i, version string -i, publisher string -i, request_count int -i, user_count int -i, name_version_publisher facet -i", primaryKeys "name,version,publisher" );
note that, we have to always keep the name column as a primitive type column along with facet composition attribute for support lucent query. Because for query attribute it is supported only for primitive type columns only.
insert into table API_SUMMARY select “sampleAPI”,”v1.0.0”,”admin”,5,2,”sampleAPI,v1.0.0,admin”;
{ "query": "", "aggregateLevel": 2, "tableName": "API_SUMMARY", "groupByField": "name_version_publisher", "aggregateFields": [ { "fieldName": "request_count", "aggregate": "SUM", "alias": "count" } ] }
the simplicity we keep empty query. And we are group by name_version_publisher field. Here we group by all the name, version and publisher. Thus aggregateLevel should be 2. If you want to group by the only name it is 0 and if group by name, version it is 1.
in a situation where you want to get all the request count of the particular API, we can use the lucene query.
ex: query: “name:myapi”
{ "query": “name”:”sampleAPI”, "aggregateLevel": 2, "tableName": "API_SUMMARY", "groupByField": "name_version_publisher", "aggregateFields": [ { "fieldName": "request_count", "aggregate": "SUM", "alias": "count" } ] }
sql equivalent query for above: select name, version, publisher, sum(request_count) from API_SUMMARY where api=”myapi” group by name, version, publisher.
You can use more than one aggregate function by adding more to aggregateFields.
ex:
{ "query": “name:sampleAPI”, "aggregateLevel": 2, "tableName": "API_SUMMARY", "groupByField": "name_version_publisher", "aggregateFields": [ { "fieldName": "request_count", "aggregate": "SUM", "alias": "count" }, { "fieldName": "user_count", "aggregate": "AVG", "alias": "user_avg" } ] }
[ { "tableName": "API_SUMMARY", "timestamp": 1446093162267, "values": { "name_version_publisher": [ "sampleAPI", "v1.0.0", "admin" ], "count": 5, "user_avg": 2 } } ]
response is in json format and it is an arrays of objects. each object contain the different result for the different groups. here we have one group and its associate result. mainly it contain the attribute for the table name, timestamp and the value. Value contain the actual results for the aggregate functions. It contain the values for the aggregate functions and the attribute value of the group.
value for the aggregate function is return with the alias name you provided with the search request like count and user_avg. And it return a record with the groupByField you used alone with a value of that group. It is a array and it’s length will equal to the aggregateLevel. And it is value preserve the same order as it’s actual data formed.
Aggregate function of a different combination of attributes. if you have many columns and you need a different type of grouping you have to define extra facet attribute with a combination of the other attributes.
ex: if you need group by name and publisher you have to define facet with a combination of name and publisher like name_publisher. Because with name_version_punlisher you can’t group by name and publisher.
Current version of DAS have following limitation in REST API and its aggregate API
Add Comment
Comments (0)