[Java] Regular expression matcher

Regular expressions are a pain, until they are not. Implementing a matcher is a double pain, and possibly the hardest exercise I've seen so far in terms of proper coding.

So many rules have to be carefully prioritised for it to succeed, even without fully supporting the common keywords. We have the following syntax:
  • alphanumeric - matches exactly that character
  • . - matches any character
  • * - must follow . or an alphanumeric character, means we can match multiple instances of the preceding character
  • ^ - must be the first character, means the matching has to begin from the start
  • $ - must be the last character, means the matching has to occur at the end
In the absence of ^ and $, the match can occur anywhere in the string.

[Java] Fisher-Yates array shuffle

The Fisher–Yates shuffle is an algorithm for generating a random permutation of a finite sequence. It can be coded to run in-place and in O(N) time where N is the input array size.

The idea is quite simple yet effective: while scanning the array from the end, select a random index with which to swap that element for among an always shrinking set of possibilities. Namely, use a getRandom function that accepts a floor and a ceiling as input and reduce the ceiling by 1 at each iteration, until the whole array has been processed.

You can check my implementation of FisherYatesShuffle on my Gist alongside some test cases in FisherYatesJTests.

Note: since we are supposed to generate truly random permutations of the input array, we have no simple way of checking for errors except to verify the output distribution itself to determine whether a bias exists or not. Therefore the current test could be extended to record for all the algorithm iterations the result of the shuffle and compare the results at the end among themselves.


[Java] Word squares tester

This exercise looked scarier than it is, although the real difficulty is not much on the algorithm itself but rather on the correct and clean implementation of it.

A “word square” is an ordered sequence of K different words of length K that, when written one word per line, reads the same horizontally and vertically. For example:


Given an arbitrary list of words, return all the possible word squares it contains. Reordering is allowed.

For example, the input list:


should output:



[Java] Calculate maximum capacity of 2D shaped object

This problem looked harder than it actually is:

Imagine an island that is in the shape of a bar graph. When it rains, certain areas of the island fill up with rainwater to form lakes. Any excess rainwater the island cannot hold in lakes will run off the island to the west or east and drain into the ocean.

Given an array of positive integers representing 2-D bar heights, design an algorithm (or write a function) that can compute the total volume (capacity) of water that could be held in all lakes on such an island given an array of the heights of the bars. Assume an elevation map where the width of each bar is 1.

Example: Given [1,3,2,4,1,3,1,4,5,2,2,1,4,2,2], return 15 (3 bodies of water with volumes of 1,7,7 yields total volume of 15)


[Java] Decompress k[value] format strings

Here is an interesting problem: decompress a compressed string.

Your input is a compressed string of the format number[string] and the decompressed output form should be the string written number times. For example:


Would be output as


Number can have more than one digit. For example, 10[a] is allowed, and just means aaaaaaaaaa

One repetition can occur inside another. For example, 2[3[a]b] decompresses into aaabaaab

Characters allowed as input include digits, small English letters and brackets [ ].

Digits are only to represent amount of repetitions.

Letters are just letters.

Brackets are only part of syntax of writing repeated substring.

Input is always valid, so no need to check its validity.

This can be done in one pass of the string by either applying recursion or explicitly using a stack for an additional O(N) space and overall O(N) time.

Gotchas are:
  • treat 0 repetitions and missing k correctly
  • remember that when concatenating data that comes from the stack we are actually prepending
  • run a last round of popping after the string is fully parsed
You can check the implementation of decompress on my Gist alongside some test cases in DecompressJTests.

[Java] Neighbour evaluation concise logic

Here is a nice trick I was taught while doing some exercises. Consider the need to perform some loop over neighbouring places for example in an array or matrix or whatever, the basic idea would be to pick a starting point and queue up all valid points reachable from that which we did not visit yet. Assume a matrix M of size N:


And X is our starting point. We could do:

 if(i - 1 >= 0 && !visited.get(M[i - 1][j])) //do something  
 if(i - 1 >= 0 && j - 1 >= 0 && !visited.get(M[i - 1][j - 1])) //do something   
 if(i - 1 >= 0 && j + 1 < N && !visited.get(M[i - 1][j + 1])) //do something     
 if(j - 1 >= 0 && !visited.get(M[i][j - 1])) //do something  
 if(j + 1 < N && !visited.get(M[i][j + 1])) //do something  
 if(i + 1 < N && j - 1 >= 0 && !visited.get(M[i + 1][j - 1])) //do something  
 if(i + 1 < N && !visited.get(M[i + 1][j])) //do something  
 if(i + 1 < N && j + 1 < N && !visited.get(M[i + 1][j + 1])) //do something  

But here is an equivalent and more concise way of achieving the same. Consider the same matrix but focus on the changes on i and j you would do:

(-1,-1) (-1,0) (-1,+1)
(0,-1)     X   (0,+1)
(+1,-1) (+1,0) (+1,+1)

Then this can easily be converted into (and even adapted to more dimensions):

 int[] di = {-1, -1, -1, 0, 0, 1, 1, 1};  
 int[] dj = {0, -1, 1, -1, 1, -1, 0, 1};  
 for(int n = 0; n < di.length; n++){  
  int x = i + di[n], y = j + dj[n];  
  if(x >= 0 && x < N && y >= 0 && y < N && !visited.get(M[x][y])) //do something  


[Java] Find longest word that can be made as subsequence of given word

Here is another interesting problem: given a string and a list of strings, determine, if any, the longest string from the list that can be built as a subsequence of the given string, by only deleting characters in it. For example, given "asd" and {"a", "ccc", "sd", "sda"} return "sd".

The complex part when working with string is as usual the fact that we potentially need to scan the string many time over and over in order to reach our goal.

In this case, we could simply check for each given string in the list, if it can be built from the given input string by looping over that and match each character until we have our answer; this takes O(L) time where L is the length of the INPUT string for a SINGLE word. So overall we would need O(NxL) time where N is the number of strings in the list.

We know can do some preprocessing to make our runtime better, at the expense of space of course. Since the input string can be very long and our result can start anywhere in there, we could do this in O(N+L) time by simultaneously process ALL words (N) for each character in the input string(L), using an additional O(N) space.