Sunday, September 6, 2015

H-Index II [LeetCode]

Problem Description
Follow up for H-Index: What if the citations array is sorted in ascending order? Could you optimize your algorithm?
Solution
In the previous version, we sorted citations in reverse order. In this version of problem, the citations array is in ascending order.
I. O(n) Solution
As we already know in the previous version of finding H-Index, we can solve the problem in linear time. Below is the 1-line Python solution:
class Solution(object):
    def hIndex(self, citations):
        """
        :type citations: List[int]
        :rtype: int
        """
        return max(min(v, len(citations)-k) for k, v in enumerate(citations + [0]))
II. O(logn) Solution
Since we already knew about how to solve the problem in linear time, we try to find another solution. As usual, the idea is to find O(logn) solution by binary search.
Our normal logic when using binary search is that we have a start and end point. We will check the middle= (start + end) /2. (In some language, to avoid overflow we can use (end-start)/2 + start )
What we need to find out now is when we should stop the search, or going to left or right.


1. When we should stop?
Suppose that middle is not in final stage (where start > end) and we decide to stop. There are (n-middle) elements are at least citations[middle], and other (middle ) elements are at most citations[middle]. This suggests that if citations[middle] = (n-middle) = h (where n is the number of elements in citations),  then we have at least h elements are at least h, and other middle = n-h elements are at most h. So h is an H-index.
Is it h  the largest? Suppose there is another h" > h and h" is also an H-index. Consider the element at n-h". Since there are at least h" elements which are at least h", then citations[n-h"] >= h">h. However, we have n-h" < n-h, therefore citations[n-h"] < citations[n-h] = h, or we have h > h. This is contradictory. So h is the largest H-index. This means that we should stop the search here!

2. When we should go to the left and right
The analysis in (1) suggest that we should compare citations[middle] and (n-middle) in order to decide whether we should go to left or right in binary search.

a. If citations[middle] < n-middle
Is it possible to stop or go left in this case? Suppose it is. That means there is a number h >= n-middle, and h is an H-index. Then there are h elements which are at least h. Therefore:
citations[n-h] > h >= n-middle > citations[middle]
However,
h >= n-middle  => middle >= n-h
That means citations[n-h] <= citations[middle]. This is contradictory.
So we should go to the right in this case.

b. If citations[middle] > n-middle
This is not same as the case (a). Easily we can see that h = n-middle is an H-index. But there might be other elements on the left satisfy the H-index condition. So we simply go to the left to search more.
Note that the current position is end + 1 after we set the end pointer to middle-1. And also, while searching on the left, if we go to the left another time, then current h= n-middle is not the largest H-index. But if we never go to the left again, then the pointer end keeps unchanged, so we can simply return n-end-1 when the search is terminated.
We come to coding now.

I. Python Code
class Solution(object):
    def hIndex(self, citations):
        """
        :type citations: List[int]
        :rtype: int
        """
        n = len(citations) 
        start, end = 0, n-1
        while start <= end:
            middle = (start + end) / 2
            if citations[middle] == n-middle: return citations[middle]
            elif citations[middle] < n - middle: start = middle + 1
            else: end = middle - 1
        return n-end-1
II. Java Code
public class Solution {
    public int hIndex(int[] citations) {
        int n = citations.length;
        int start = 0;
        int end = n-1;
        int middle;
        while (start <= end){
            middle = (end-start)/2 + start;
            if (citations[middle] == n-middle) return citations[middle];
            else if(citations[middle] < n-middle) start = middle + 1;
            else end = middle - 1;
        }
        return n - end - 1;
    }
}
III. C++ Code
class Solution {
public:
    int hIndex(vector<int>& citations) {
        int n = citations.size();
        int start = 0;
        int end = n - 1;
        int middle = 0;
        while(start <= end){
            middle = (end-start)/2 + start;
            if (citations[middle] == n-middle) return citations[middle];
            else if(citations[middle] < n-middle) start = middle + 1;
            else end = middle - 1;
        }
        return n-end-1;
    }
};

2 comments:

  1. Nice thought of binary search ! I got the point where i was missing :)

    ReplyDelete
  2. Welcome Vivek! And if you have any idea or thought, don't hesitate to share with me ^_^

    ReplyDelete