Thursday, December 17, 2015

Remove Duplicate Letters [LeetCode]

Problem Description
Given a string which contains only lowercase letters, remove duplicate letters so that every letter appear once and only once. You must make sure your result is the smallest in lexicographical order among all possible results.
Example:
Given "bcabc"
Return "abc"
Given "cbacdcbc"
Return "acdb"
Credits:
Special thanks to @dietpepsi for adding this problem and creating all test cases.
Solutions
In my personal opinion, this problem is kinda interesting.

I. Simple Solution - O(kN) 
(Where k is the total number of distinct characters, N is length of the original string)

My first question after reading the problem description is that: What is the first character of the possible result? Can I determine it?

Yes, we can determine by an observation. If we call totalChars is the total number of distinct characters in the string. Then the required result must have a length of totalChars.  

Furthermore, its first character must be the smallest among all possible candidates. So what is a possible candidate? We easily see that a character is able to be the first character in a possible result string must have (totalChars - 1) distinct characters staying behind it in the original string.

For example, given s="bcabc", there are possible results "bca", "cab", "abc". We see that b=s[0] has 2 = totalChars-1 distinct characters behind it "a", "c". Similarly with c=s[1] and a=s[2]. Among these 3 possible candidates,  a=s[2] is the smallest. If there are 2 possible candidates with the same character, we just pick the one with smaller position.

Now after getting the first character of the result, what we do next? We can eliminate that character from the original string (by marking it as assigned). Then we repeat the process for the rest of characters!

Below is the java code.
public class Solution {

    public int ASSIGNED = -1;
    public int UNTOUCHED = 0;
    public int TOUCHED = 1;
    public char LARGE_CHAR = (char) 127;
    public String removeDuplicateLetters(String s){
        int n = s.length();
        if(n == 0) return s;
        
        //We use 128 is to avoid substraction
        //if we use 26, we have to substract 'a' from a char
        int[] status = new int[128];
        
        char c, smallestChar;
        int totalChars = 0;
        
        for(int i=0; i<n; i++){
            c = s.charAt(i);
            if(status[c] == UNTOUCHED) totalChars++;
            status[c] = TOUCHED;
        }
        
        StringBuilder bd = new StringBuilder();
        int tt = -1; //temp variable
        int last = -1; //last position of char that was assigned
        
        for(int i=totalChars; i>0; i--){
            smallestChar = LARGE_CHAR;
            totalChars = 0;
            
            //reset the status array
            for(int j='a'; j<='z'; j++) 
                if (status[j] == TOUCHED) status[j] = UNTOUCHED;
            
            //choose the smallest candiate by running backward
            for(int j=n-1; j>last; j--){
                c = s.charAt(j);
                if(status[c] == ASSIGNED) continue;
                if(status[c] == UNTOUCHED)totalChars++;
                
                if(totalChars == i) {
                    if(c <= smallestChar){
                        smallestChar = c;
                        tt = j;
                    }
                }
                status[c] = TOUCHED;
            }
            
            status[smallestChar] = ASSIGNED; //marked as assigned
            last = tt;
            bd.append(smallestChar);
        }
        
        return bd.toString();
    }
}
II. Better Solution - O(hN)
(Where h is a number between 1 and the total number of distinct characters)

After coming up with the above solution, I mumble "smallest candidates, smallest char, smallest possible, smallest..." . Yes, in that solution, we tried to find a smallest single character, why don't we try to find a smallest set of characters? Let's go on that direction!

We call a candidate set is a subset of distinct characters from 0 to i in the original string s so that there are still enough available characters from (i+1) to the end of string to make up totalChars distinct character.

Let's consider s="bcabc", at position 0, the candidate set is {b}. At 1, the candidate sets are {b}, {c}, {b,c}. At 2, the candidate sets are {b}, {c}, {a}, {b,c}, {c,a}, {b,c,a}. And our purpose is the same: find the smallest candidate set (by smallest, we mean lexicographically smallest). I.e, At 1, smallest candidate set is obviously {b}, at 1 it is {b}, and at 2 is {a}.


Suppose at position i, we have the smallest candidate set {a0,a1,..., ak}. Now at position i+1, what is the smallest candidate set? We know that {a0,a1,..., ak} are distinct. 

If s[i+1] already in {a0,a1,..., ak}, is it possible that there is a subset {ai0, ai1, ..., aij} so that  {ai0, ai1, ..., aij, s[i+1]}{a0,a1,..., ak} is also a candidate set? If yes, we can easily see that {ai0, ai1, ..., aij} is a candidate set at position i, and {ai0, ai1, ..., aij} < {a0,a1,..., ak} . This means that {a0,a1,..., ak} is not the smallest candidate set at position i. Therefore, we don't need to care if s[i+1] is already in the candidate set (or we call it "assigned").

Now, if s[i+1] is not "assigned",  we have the same question - is it possible that there is a subset {ai0, ai1, ..., aij} so that  {ai0, ai1, ..., aij, s[i+1]} < {a0,a1,..., ak} is also a candidate set? If ak > s[i+1], and there are still characters that equals ak after i+1, we can remove ak and check again with ak-1; if there is no more, replace it by s[i+1]. If ak < s[i+1], we cannot replace ak by s[i+1]. So we just simply add s[i+1] to the set. Simply enough?

And to represent the smallest candidate set, we can use linked list, or array. Below are 2 different implementations.

a) Using Linked List (also by array ^_^)
public class Solution {

    public static char START = (char)('a'-1);
    public String removeDuplicateLetters(String s){
        if(s.length() == 0) return s;
        
        //We use 128 is to avoid substraction
        //if we use 26, we have to substract 'a' from a char
        int[] count = new int[128];
        char[] prev = new char[128];
        boolean[] assigned = new boolean[128];
        char c;
        char end = START;
        
        for(int i=0; i<s.length(); i++){
            c = s.charAt(i);
            count[c]++;
        }
        
        for(int i=0; i<s.length(); i++){
            c = s.charAt(i);
            count[c]--;
            if(assigned[c])
                continue;
                
            while(end >= c && count[end]>0){
                assigned[end] = false;
                end = prev[end];
            }
            
            prev[c] = end;
            end = c;
            assigned[c] = true;
        }
        
        StringBuilder bd = new StringBuilder();
        while(end>START){
            bd.append(end);
            end = prev[end];
        }
        return bd.reverse().toString();
    }
}
b) Using array (which similar to stack)
public class Solution {

    public String removeDuplicateLetters(String s){
        if(s.length() == 0) return s;
        
        //We use 128 is to avoid substraction
        //if we use 26, we have to substract 'a' from a char
        int[] count = new int[128];
        char[] result = new char[26];
        boolean[] assigned = new boolean[128];
        char c;
        int end = -1;
        
        for(int i=0; i<s.length(); i++){
            count[s.charAt(i)]++;
        }
        
        for(int i=0; i<s.length(); i++){
            c = s.charAt(i);
            count[c]--;
            if(assigned[c])
                continue;
                
            while(end >= 0 && result[end] > c && count[result[end]]>0){
                assigned[result[end]] = false;
                end--;
            }
            
            end++;
            result[end] = c;
            assigned[c] = true;
        }
        
        StringBuilder bd = new StringBuilder();
        for(int i=0; i<=end; i++){
            bd.append(result[i]);
        }
        return bd.toString();
    }
}

No comments:

Post a Comment