Find all subset of string in dictionary is to find the subset of an input string that exists in dictionary. The dictionary contains one million words. For this question, the trick is to find subset, not substring. The difference between substring and subset is: The substring is contiguous sequence of characters in the string. While in subset, characters are not ordered nor continuous.
The intuition is to get the combinations of the characters in the input string. Then check whether they exist in the dictionary. The time complexity of combinations is O(n!). If the string has 30 characters. The Time complexity is O(e^32). To improve, a hint (from interviewer) is to loop through each word in dictionary. Because one million is better than combinations.
To solve the problem, sorting plays the key role. We sort the characters in the string, and characters of the words in the dictionary. If the substring of the word exists in the sorted input string, that means the subset exists in the dictionary. The time complexity of the solution is O(d), d is the number of words in dictionary, so it is O(e^6). Sorting string of 30 characters is O(slogn) ~ O(s^2) (depend what methods you use do sort ). It is small compared to a million. It can be ignored. The time complexity O(e^6) is much better than O(e^32)!
Facebook Interview Question:
Find all words [A-Z] in a dictionary (about 1M words) that are made of a subset (in any order) of the chars in the input parameter [A-Z].
ex: input “ACRAT” (10 to 20 chars, up to 30 worst case)
matching words: “A”, “CAR”, “ACA”, “CART”, “RAC”
non-matching words: “BAR”, “AAA”
The dictionary doesn’t change but the function will be called extensively.
Java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | import java.util.*; public class FindAllSubsetWords { //Use sort, Time O(d), Space O(d), d is dictionary length (1 million) //the sorting of input word slogs (s worst 30) can be ignored compared to d (million) //1M is better than permutation of input word s!(30! = 2e+32) public static List<String> allSubsetsInStr(String s, Set<String> dict) { List<String> res = new ArrayList<>(); char[] chs = s.toCharArray(); Arrays.sort(chs); String sortedStr = new String(chs); for (String w : dict) { char[] chw = w.toCharArray(); Arrays.sort(chw); String sortedWord = new String(chw); if (sortedStr.indexOf(sortedWord) >= 0) res.add(w); } return res; } public static void main(String[] args) { //Define dictionary Set<String> dict = new HashSet<>(Arrays.asList("A", "CAR", "ACA", "CART", "RAC", "BAR", "AAA", "AN")); System.out.println("Dictinary words: " + dict); //Find all subsets in words String str = "ACRAT"; System.out.println("Find all words from subset of " + str + " that are in dictionary: "); System.out.println(allSubsetsInStr(str,dict)); } } |
JavaScript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | //Use sort, Time O(d), Space O(d), d is dictionary length (1 million) function findSub(input, dict) { var res = []; var charArray = Array.from(input); charArray.sort(); var s1 = charArray.join(""); for (let s of dict) { var chars = Array.from(s); chars.sort(); let s2 = chars.join(""); if (s1.indexOf(s2) >= 0) res.push(s); } return res; } let dict = new Set(["A", "CAR", "ACA", "CART", "RAC","BAR", "AAA","AN"]); console.log(findSub("ACRAT",dict)); |
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #Use sort, Time O(d), Space O(d), d is dictionary length (1 million) def findSub(input, dict) : res = [] charArray = list(input) charArray.sort() s1 = ''.join(charArray) for s in dict: chars = list(s) chars.sort() s2 = ''.join(chars) if s1.find(s2) >= 0: res.append(s) return res dict = set(("A", "CAR", "ACA", "CART", "RAC","BAR", "AAA","AN")) print(findSub("ACRAT",dict)) |
Output:
[A, CAR, RAC, ACA, CART]
O Notation:
Time complexity: O(d)
Space complexity: O(d)
d is dictionary length (1 million)
Download FindAllSubsetWords.java
Download FindAllSubsetWords.js
Download FindAllSubsetWords.py