July 8, 2014

Thinking of Serial Number and Locating Computation in esProc

1.Accessing Members

Members in a set (sequence) of esProc are organized in order. Therefore, you can reference a member in a set with the serial number of the member. The more flexible use of serial number, the better esProc functions and the operation will be much simpler and more efficient.In fact, the serial number or serial number ISeq must be used to implement certain functions in esProc, such as the delete() function for deleting record, and compose() function for resorting TSeq.


The simplest application is to access members with their serial numbers directly; this is the same as what to do for accessing an array with a normal programming language.


You can use the m() function to get members backwards or in a loop manner.



In addition, esProc provides a series of functions whose names begin with the letter "p". These functions are used for searching for the serial numbers of members, as given below:




When running the pos function, if a specified member is not found in a sequence, 0 will be returned. This function can be used to judge whether a member is in a set or not.




2.Accessing Subsets


With serial numbers, you can access the subsets of a set.



In addition, you can also use the m() function to access a subset by specifying the corresponding serial number.


Similarly, you can use the option @a in a position search function to search for the serial numbers of all the members satisfying specified conditions.




If you want to get the positions of multiple members once for all, you can use the pos function, the @i option may be required in certain cases.




The posi function returns null if a certain member is not found in a sequence. Considering the misplaced order and repetitive members may also result in the null value returned, you cannot simply use this function to judge if the specified subsets are included; instead, you should use an intersection operation.




3.Locating by Using the Loop Function


Like the symbol ~, the symbol # in a loop function indicates the serial number of the current member.




In a loop function, you can use the symbol [] to access members in a relative mode.



In addition, you can use the symbol {} to access subsets in a relative mode. 



4.Alignment Access


As we know, the symbol # in a loop function is used to indicate the serial number of the current member. In fact, it is a number which can be operated like other numbers. Especially, it can be used as a serial number to access a member in another sequence. This is very important for the alignment access.




When independent sequences are arranged in the same order, you can use the alignment access to generate fields consisting of records.




5.Sequence Alignment


Before an alignment access is executed, it is necessary that all the sequences are arranged in the same order. However, in practice, sequences are not always in the same order. Under such circumstance, you should use the align function to re-order sequences according to the order of a certain sequence so as to arrange them in the same order.



In fact, an align group function align@a can also return a sequence aligned with a standard sequence; however, in this case, each member in the group is a set.




Using the align() function can fetch the first member of each grouped subset and then return a set consisting of these first members, instead of returning a set consisting of subsets. If there is only one member in each grouped subset, using this function is to order these members according to a standard sequence.


Similarly, the alignment access can be used in an enum group; here, enum@1 is not commonly used.


6.Interval Integer Sequence


An integer sequence is a special set that is applicable to all the set operations. In addition, it can be used as a serial number for accessing a subset in another sequence. Using the integer sequences freely is vital for you to form a thinking of serial number. 



You can process subsets by an integer sequence consisting of the subsets’ positions in the original set.



7.ISeq Consisting of Serial Numbers


After a sequence is ordered, the previous order of the members in the sequence will be discarded. However, in certain conditions, this order information may be required. For example, we may need to know the entry order of the three oldest employees in the company, the amount of increase of a share’s price for the three trading day on which the share prices are on the highest level, and so on.


This problem can be solved by using the psort function in esProc; the function returns the previous order of the ordered members.


In plain words, in an integer sequence returned by the psort function, the first number is, relative to the original sequence, the serial number of the member which should be placed in the first place; the second member is, relative to the original sequence, the serial number of the member which should be placed in the second place; the rest may be deduced by analogy.


For the sequence resulting from the serial number ISeq, you can also use the inv() function to get the inverse ISeq composed of the serial number ISeq to restore.



You can use the psort function to solve the above problem which requires that the original serial numbers should be kept.



A binary search is widely recognized for its high efficiency; however, it requires that an original sequence is sorted by keywords. So, before a binary search is executed, the original sequence must be sorted. However, this is not suitable for all. For example, if you want to search for a member in a sequence, you can of course run the sort function before the searching; but if you want to search for an index of a member, running the sort function before searching would damage order; in this case ,you should use psort() function.



In this case, psort creates a binary search index for the sequence; there could be one or more search serial numbers, depending on keywords, for a single sequence.


In addition, an align group function can also return an ISeq consisting of serial numbers, instead of the sequence aligned.

 



8.Locating Computation 


After working out serial numbers of records needed, we can compute the required results with locating computation A.calc().The locating computation can avoid unnecessary computation and increase efficiency. 




In this case, the binary file VoteRecord stores poll results, with a descending sort of the votes. A4 is the computed result of employee ID sequence of a specified state. A5 represents the number of votes they needed in order to moving up. For example, Ryan Williams, now ranking 3rd, needs another 69 votes to move up one place. Cross-rows operation will be needed for computation, because it cannot be completed only with data of selected employees.


Related:
1.Thinking of Set in esProc

2.Referencing Thoughts in esProc

3.Cursor Thoughts in esProc

4.Basic Data Type in Data Processing Programming Language

No comments:

Post a Comment