Top K Frequent Words
1. Clarify the problem: Before diving into solving the problem, let's clarify the requirements:
- We are given a list of strings.
- Our task is to find the k most frequent elements in the list.
- If there are multiple elements with the same frequency, they should be ordered lexicographically.
2. Analyze the problem: To solve this problem, we can use a combination of hash maps and sorting.
- We can use a hash map to count the frequency of each word in the list.
- After counting the frequencies, we can sort the words based on their frequency and lexicographic order.
- Finally, we can return the top k elements from the sorted list.
3. Design an algorithm: Here is the algorithm to solve the problem:
- Create a hash map
counter
to store the frequency of each word in the list. - Iterate through the list and update the frequency in the
counter
hash map. - Sort the unique words in lexicographic order.
- If two words have the same frequency, sort them lexicographically.
- We can use a custom sorting function that compares the frequency first and then the words lexicographically.
- Return the top k elements from the sorted list.
4. Explain your approach: The approach involves using a hash map to count the frequency of each word in the list. We then sort the unique words based on their frequency and lexicographic order. To achieve this, we use a custom sorting function that compares the frequency first and then the words lexicographically. Finally, we return the top k elements from the sorted list.
5. Write clean and readable code:
python
def topKFrequent(words, k):
counter = {}
for word in words:
counter[word] = counter.get(word, 0) + 1
unique_words = sorted(counter.keys(), key=lambda x: (-counter[x], x))
return unique_words[:k]
6. Test your code: Let's test the code with different test cases:
words = ["i", "love", "leetcode", "i", "love", "coding"]
,k = 2
- Expected output:
["i", "love"]
- Explanation: The word "i" appears twice, and the word "love" appears twice. Since "i" comes before "love" lexicographically, "i" is the first word in the output.
- Expected output:
words = ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"]
,k = 4
- Expected output:
["the", "is", "sunny", "day"]
- Explanation: The word "the" appears four times, "is" appears three times, "sunny" appears two times, and "day" appears once. Sorting these words lexicographically, we get the output as shown.
- Expected output:
words = ["aaa", "aa", "a"]
,k = 3
- Expected output:
["a", "aa", "aaa"]
- Explanation: All three words appear once, so we sort them lexicographically.
- Expected output:
7. Optimize if necessary: The initial solution already follows an optimal approach using a hash map and sorting. There are no further optimizations required.
8. Handle error cases:
The code assumes that the input words
is a valid list of strings and k
is a positive integer. If words
is empty or k
is zero, the code will still work correctly and return an empty list.
9. Discuss complexity analysis:
- The time complexity of this solution is O(n log n), where n is the length of the input list. This is because we iterate through the list to count the frequency of each word and sort the unique words.
- The space complexity is O(n), where n is the length of the input list. We store the frequency of each word in a hash map, which can take up to n elements. Additionally, the sorted list of unique words can also have up to n elements.
- The solution has an optimal time and space complexity given the problem requirements.