What happens when … a failed healthcare system is opened to FDI

So there are many takers for Indian hospitals.This is very interesting given the fact that India’s healthcare spending as a percentage of GDP (~ 3.9% including private sector, ~1.8% excluding) is one of the smallest in the world whereas it has the second largest population in the world.

In this ‘what happens when’ we explore this scenario a bit. In India the demand is definitely there for healthcare. The supply is skewed towards urban areas and the variance in availability and quality of service is massive, especially when comparing Government run hospitals and private hospitals.

What cannot be argued against is that there is a massive gap between demand and supply with demand far outstripping supply. This gap is likely to further widen as the population ages. This gap, for sure, increases as we move down the wealth ladder.

In theory – investment is supposed to increase the supply (if used to build capacity). But that model works for products more than services. Especially where services are specialised in nature (e.g. MRI scanning, Gamma knife) or require extensive training (e.g. general practitioner, dentist etc.)  or both (e.g. neuro-surgeon, heart specialists, cancer specialists, neo-natal specialists etc.).

Therefore more money in today does not mean more doctors tomorrow. It means higher packages being offered to currently qualified doctors and specialists irrespective of whether they are in the private or public sectors. This would mean an increased demand for a resource in limited supply. In turn, to recoup the higher input costs the hospitals will have to find ways of either increasing their charges, reducing other costs or somehow battle to increase occupancy rates (perhaps by connecting with Health Insurance schemes?). This becomes even more important when we take into account the fact that any investment will require a return. Whether it is over a longer term or short term, fixed or variable.

So it leads to some disturbing conclusions:

  1. Brain drain away from the public sector into the private
  2. Providers sticking to safe markets (e.g. urban areas)
  3. Increased gap between quality and availability of healthcare as the costs rise
  4. Rising inequality in terms of access to healthcare
  5. Increased reliance on insurance to come in and plug the gap between treatment costs and income (insurance – healthcare provider nexus)

To think positively one can look at the silver lining:

  1. It would encourage setting up of integrated medi-cities (treatment, training and research) and expansion of medical education (suddenly all those medical colleges churning out MBBS will have more incentive to expand and improve quality of education – especially if the foreign owned healthcare facilities are more discerning than their local counterparts)
  2. There may be some risk takers driven by new investments, who may want to explore newer markets (e.g. smaller cities, villages) and come up with innovative business models for healthcare delivery
  3. Increased accountability and a driver to improve medical insurance (the US model)
  4. Turning to medical tourism to ‘subsidise’ treatments for locals (in the same way UK universities use foreign students to subsidise home students)
  5. Faster and (hopefully) cheaper access to advanced treatments

In all of this the Citizens of India and the Government will have to make sure that they act as watchdogs to make sure FDI does not result in exploitative practices or long term mis-alignment of the healthcare system in India.

 

Recurrent Neural Networks to Predict Pricing Trends in UK Housing Market

Recurrent Neural Networks (RNN):

RNNs are used when temporal relationships have to be learnt. Some common examples include time series data (e.g. stock prices), sequence of words (e.g. predictive text) and so on.

The basic concept of RNNs is that we train an additional set of weights (along with the standard input – output pair) that associate past state (time: t-1) with the current state (time: t). This can then be used to predict the future state (time: t+1) given the current state (time: t). In other words RNNs are NNs with state!

When used to standard time series prediction the input and output values are taken from the same time series (usually a scalar value). This is a degenerate case of single valued inputs and outputs. Thus we need to learn the relationship between x(t-1) and x(t) so that we can predict the value of x(t+1) given x(t). This is what I did for this post.

Time series can be made more complicated by making the input a vector of different parameters, the output may still remain a scalar value which is a component of x or be a vector. One reason this is done is to add all the factors that may impact the value to be predicted (e.g. x(t+1)). In our example of average house prices – we may want to add factors such as time of the year, interest rates, salary levels, inflation etc. to provide some more “independent” variables in the input.

Two final points:

  • Use-cases for RNNs: Speech to Text, Predictive Text, Music Tagging, Machine Translation
  • RNNs include the additional complexity of training in Time as well as Space therefore our standard Back-Propagation becomes Back-Propagation Through Time

RNN Structure for Predicting House Prices:

RNN simple time series

The basic time series problem is that we have a sequence of numbers – the average price of houses for a given month and year (e.g. given: X(1), X(2), … X(t-1), X(t) ) with a regular step size and our task is to predict the next number in the sequence (i.e. predict: X(t+1)). In our problem the avg price is calculated for every month since January 1995 (thus step size is 1 month). As a first step we need to define a fixed sequence size that we are going to use for training the RNN. For the input data we will select a sub-sequence of a given length equal to the number of inputs (in the diagram above there are three inputs). For training output we will select a sub-sequence of the same length as the input but the values will be shifted one step in the future.

Thus if input sub-sequence is: X(3), X(4) and X(5) then the output sub-sequence must be: X(4), X(5) and X(6). In general if input sub-sequence spans time step to where b > a and b-a = sub-sequence length, then the output sub-sequence must span a+1 to b+1.

Once the training has been completed if we provide the last sub-sequence as input we will get the next number in the series as the output. We can see how well the RNN is able to replicate the signal by starting with a sub-sequence in the middle and movie ahead in time steps and plotting actual vs predicted values for the next number in the sequence.

Remember to NORMALISE the data!

The parameters are as below:

n_steps = 36 # Number of time steps (thus a = 0 and b = 35, total of 36 months)

n_inputs = 1 # Number of inputs per step (the avg. price for the current month)

n_neurons = 1000 # Number of neurons in the middle layer

n_outputs = 1 # Number of outputs per step (the avg. price for the next month)

learning_rate = 0.0001 # Learning Rate

n_iter = 2000 # Number of iterations

batch_size = 50 # Batch size

I am using TensorFlow’s BasicRNNCell (complete code at the end of the post) but the basic setup is:

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])

cell = tf.contrib.rnn.OutputProjectionWrapper(tf.contrib.rnn.BasicRNNCell(num_units = n_neurons, activation = tf.nn.relu), output_size=n_outputs)

outputs, states = tf.nn.dynamic_rnn(cell, X, dtype = tf.float32)

loss = tf.reduce_mean(tf.square(outputs-y))
opt = tf.train.AdamOptimizer(learning_rate=learning_rate)
training = opt.minimize(loss)

saver = tf.train.Saver()

init = tf.global_variables_initializer()

Results:

A sample of 3 runs, using Mean Squared Error threshold of 1e-4 we get the following values for Error:

  1. 8.6831e-05
  2. 9.05436e-05
  3. 9.86998e-05

Run 3 fitting and predictions are shown below:

Orange dots represent the prediction by the RNN and Blue dots represent the actual data

 

Run 3 prediction against existing data 3 years before October 2017

Then we start from October 2017 (Month 24 in figure below) and forecast ahead to October 2018. This predicts a rise in average prices which start to plateau 3rd quarter of 2018. Given that average house prices across a country like UK are determined by a large number of noisy factors, we should take this prediction with a pinch of salt.

Run 3 Forecasting from Month 24 (October 2017 for the year ahead till October 2018)

A sample of 3 runs, using Mean Squared Error threshold of 1e-3 we get the following values for Error:

  1. 3.4365e-04
  2. 4.1512e-04
  3. 2.1874e-04

With a higher Error Threshold we find when comparing against actual data (Runs 2 and 3 below) the predicted values have a lot less overlap with the actual values. This is expected as we have traded accuracy for reduction in training time.

predicted avg price vs actual avg price (Run 2)

predicted avg price vs actual avg price (Run 3)

Projections in this case are lot different. We see a linearly decreasing avg price in 2018.

predicted avg price vs actual avg price with forecast

Next Steps:

I would like to add more parameters to the input – but it is difficult to get correlated data for different things such as interest rates, inflation etc.

I would also like to try other types of networks (e.g. LSTM) but I am not sure if that would be the equivalent of using a canon to kill a mosquito.

Finally if anyone has any ideas on this I would be happy to collaborate with them on this!

 

Source code can be found here: housing_tf

Contains HM Land Registry data © Crown copyright and database right 2017. This data is licensed under the Open Government Licence v3.0.

UK Housing Data Analysis: Additional Price Paid Entry

I have been exploring the HM Land Registry Price Paid Data and have discovered few more things of interest.

The data contains a ‘Price Paid Data Category Type’ (this is the second last column at the time of writing this post. As per the description of the schema this field can have one of two values:

A = Standard Price Paid entry, includes single residential property sold for full market value.
B = Additional Price Paid entry including transfers under a power of sale/repossessions, buy-to-lets (where they can be identified by a Mortgage) and transfers to non-private individuals.

Therefore it seems there is a way of looking at how properties sold for full market value differ from buy-to-lets, repossessions and power of sale transactions. Proper Category B tracking only starts from October 2013.

Before we do this it is worthwhile to use the ‘Property Type’ field to filter out properties of type ‘Other’ which contribute to the overall noise because these are usually high value properties such as office buildings. The ‘Property Type’ field has the following values:

D = Detached,

S = Semi-Detached,

T = Terraced,

F = Flats/Maisonettes,

O = Other

Data Pipeline for all transactions:

Step 1: Filter out all transactions with Property Type of Other

Step 2: Group using Year and Month of Transaction

Step 3: Calculate Standard Deviations in Price, Average Price and Counts

 

Data Pipeline for Standard and Additional Price Paid Transactions (separate):

Step 1: Filter out all transactions with Property Type of Other

Step 2: Group using Price Paid Data Category Type, Year and Month of Transaction

Step 3: Calculate Standard Deviations in Price, Average Price and Counts

Tech stuff:

I used a combination of MongoDB (aggregation pipelines for standard heavy weight aggregations – such as simple grouping), Apache Spark (Java based for heavy weight custom aggregations) and Python (for creating graphs and summarising aggregated data)

Results:

In all graphs Orange points represent Category B related data, Blue represents Category A related data and Green represents a combination of both the Categories.

Transaction Counts

Price Paid Data Category A/B Transaction Count

Price Paid Data Category A/B Transaction Count

Category B transactions form a small percentage of the overall transactions (5-8% appprox.)

As the Category B data starts from October 2013 we see a rapid increase in Category B transactions which then settles to a steady rate till 2017 where we can see transactions falling as it becomes less lucrative to buy a second house to generate rental income. There is a massive variation in terms of overall and Category A transactions. But here as well we see a downward trend in 2017.

We can also see the sharp fall in transactions due to the financial crisis around 2008.

In all graphs Orange points represent Category B related data, Blue represents Category A related data and Green represents a combination of both the Categories.

Average Price

Price Paid Data Category A/B Average Price

Price Paid Data Category A/B Average Price

Here we find an interesting result. Category B prices are consistently lower than pure Category A. But given the relatively small number of Category B transactions the average price of combined transactions is fairly close to the average price of Category A transactions. This also seems to point to the fact that in case of buy to let, repossessions and power of sale conditions the price paid is below the average price for Category A. Several reasons could exist for such a result:

  1. People buy cheaper properties as buy-to-let and use more expensive properties as their main residence.
  2. Under stressful conditions (e.g. forced sale or repossession) there is urgency to sell and therefore full market rate may not be obtainable.

Standard Deviation of Prices

Price Paid Data Category A/B Price Standard Dev.

Price Paid Data Category A/B Price Standard Dev.

The variation in the price for Category B properties is quite high when compared with Category A (the standard price paid transaction). This can point to few things about the Category B market:

  1. A lot more speculative activity is carried out here therefore the impact of ‘expectation’ on price paid is very high – particularly:
    1. ‘expected rental returns’: The tendency here will be to buy cheap (i.e. lowest possible mortgage) and profit from the difference between monthly rental and mortgage payments over a long period of time.
    2. ‘expected profit from a future sale’: The tendency here will be to keep a shorter horizon and buy cheap then renovate and sell at a higher price – either through direct value add or because of natural increase in demand.
  2. For a Standard transaction (Category A) the incentive to speculate may not be present as it is a basic necessity.

Contains HM Land Registry data © Crown copyright and database right 2017. This data is licensed under the Open Government Licence v3.0.

UK House Sales Analysis

I have been looking at house sales data from the UK (actually England and Wales). This is derived from the Land Registry data set (approx. 4GB) which contains all house sales data from mid 1990s. Data contains full address information so one can use reverse geo-coding to get the location of the sales.

Sales Density Over the Years

If we compare the number of sales over the years an interesting picture emerges. Below is the geographical distribution of active regions (w.r.t. number of sales).

Years 2004-2007 there is strong activity in the housing market – this is especially true for London (the big patch of green), South coast and South West of England.

The activity penetrates deeper (look at Wales and South West) as the saturation starts to kick in.
The financial crisis hits and we can immediately see a weakening of sales across England and Wales. It becomes more difficult to get a mortgage. Market shows first signs of recovery especially around London.  Market recovery starts to gain momentum especially outside London.
The recovery is now fairly widespread thanks to various initiative by the Government, rock bottom interest rates and a generally positive feeling about the future. Brexit and other factors kick in – the main issue is around ‘buy-to-let’ properties which are made less lucrative thanks to three-pronged attack: increase in stamp duty on a second house, removal of tax breaks for landlords and tightening of lending for a second home (especially interest-only mortgages).

Finally 2017 once can see that the market is again cooling down. Latest data suggests house prices have started falling once again and with the recent rise in interest rates it will make landing a good deal on a mortgage all that more difficult.

Average House Prices

Above graph shows how the Average price of Sales has changed over the Years. We see there is a slump in prices starting from 2017. It will be interesting to see how the house prices behave as we start 2018. It will be a challenge for people to afford higher mortgages as inflation outstrips income growth. This is especially true for first-time buyers. Given the recent bonanza of zero percent stamp duty for first time buyers I am not sure how much of an impact (positive) this will have.

Returns on Properties

Above graph shows how the returns and risks associated with a house change after a given number of years. It is clear that it is easier to get a return when a house is held for at least 5 years. Below that there is a risk of loosing money on the property. Properties resold within two years are most likely to make a loss. This also ties in with a ‘distress’ sale scenario where the house is sold without waiting for the best possible offer or in times of slowdown where easy term mortgages are not available.

Number of Times Re-sold

Above graph shows the number of times a house is re-sold (vertical) against the number of years it is held for before being re-sold. Most houses are re-sold within 5 years. But why a massive spike where houses are re-sold within 2 years? One possible explanation is that these are houses that are bought by a developer, improved and then re-sold within a year or so.

House Transactions by Month of Year

Transaction by month

What is the best time of the year to sell your house? Counting number of transactions by month (figure above) we can see number of transactions increases as Spring starts and continues to grow till the end of Summer. In fact 60% more houses are sold in Jun – Aug period as compared to Jan – March.

Transactions tend to decrease slightly as Autumn starts and falls off towards end of the year. This is expected as people would not want to move right after Christmas or early in the new year (winter moves are difficult!)

Infrastructure

I have used Apache Spark (using Java) to summarise the data from approximately 4 GB to 1-1.5 GB CSV files and then Python to do next round of aggregations and to generate the plots.

 

Next step will be to incorporate some Machine Learning into the process.

5G Networks – Just Over the Horizon

Introduction

How many times have you heard yourself say ‘this call might fail because I am boarding a bus/train/aircraft’? How many times have you tried making a call while in a busy area and found that the call does not get through? How many times have you lost mobile signals on a highway and not been able to make a call let alone access 3G/4G data? How many times have you struggled to send an email as you go in and out of a metro (underground) station? How many times have you screamed silently when connecting to a ‘free’ wifi is harder than learning how to fly a plane!

Finally how confident are you that high data rate services such as video calls, live streaming, YouTube etc. will continue to work as you commute, attend events (such as concerts), travel or just take a walk in a mall (reinforced concrete and steel are bad for mobile signals!).

5G networks aim to address many of these problems ‘out of the box’.

There are several major projects underway all over the world to produce specifications, proof of concepts, commercial and technology test beds. The European Commission is heavily involved via the 5G-PPP (Public Private Partnership) initiative which also means there is decent amount of funding available.

So what is 5G? How is it different from 4G? What are its major use cases? I will attempt to answer some of these questions in this post.

What is 5G?

Firstly while 5G is obviously about connecting wireless smart devices, it involves a whole lot more. Unlike our familiar 4G/3G network which is mostly restricted to the wireless domain, 5G aims to be a fusion of wireless and wired. The reason I call it a fusion is because the main thrust of 5G is towards removing boundaries between different domains (mobile, wifi, wired broadband) from the end user perspective and provide seamless access.

Seamless, On-demand Services

To enable seamless access for the user, all the various network functions and resources need to be packaged as a product, made accessible through a standard interface. On top of this you need a catalogue based mechanism which effectively allows stitching together of these resources and functions to provide a service to the end user.

Think of it like preparing a new dish. Once you get the recipe, you go to your local supermarket and gather the ingredients (resources) from a clearly laid out aisle. Then you come home and gather various cooking implements (functions) to process the ingredients. Finally you follow the recipe and transform the ingredients using the cooking implements into a dish – which you serve to the end user.

This is broadly speaking what a 5G network will attempt to do – while keeping track of the Quality of Service and ensuring Service Levels do not fall below an advertised limit. This equates to maintaining food quality at a certain advertised level irrespective of whether you are cooking at a campsite or in your own kitchen or in a restaurant.

For example if you are attempting to watch a HD video stream while on the move – all the resources and functions you need (and this is not a static set – as you may move in and out of various radio access technology domains such as 3G, 4G and Wifi) should be provided at a given point in time so that there is no disruption in the video stream.

Dawn of IoT

5G networks are also being designed, from the ground up, to support so called ‘machine type communication’. Machine type communication is characterised by a very large number of simple low-power sensors that require a data-link to push data to a gateway server. The data-link is usually wireless and often has low-bandwidth requirements. Several specialised sensor-gateway protocols are being developed (e.g. LoraWAN), but at a given level all that traffic will end up using 5G network links.

There are other machine type communication requirements that behave more like humans i.e. they require always on (mission critical) connections. For example an autonomous car may use an active data-link to pull down maps, upload sensor data or initiate transactions on behalf of the driver (e.g. pay tolls).

Current networks will find it almost impossible to maintain and service such a large number of hosts with such a wide spectrum of requirements.  In fact this is not a trivial use-case.

 

 

Comparing 5G with 4G

In this section I will highlight the principle differences between 5G and 4G and what features we can look forward to in the next decade:

End to End Latency: Latency is the time taken for a packet to travel from its source to its destination, lower the better. 4G = 25 milliseconds to 5G < 5 milliseconds (5 times less)

Reliability: Percentage of packets that are successfully passed through the network. In this case higher the better. 4G = 99.99% to 5G = 99.999% which means 1 in 100,000 packets is lost.

Service Deployment Time: The time to setup a new service should be as less as possible thereby allowing a faster time to market; basically no one likes to wait! The target is to reduce it from the current 4G wait of 90 days to 90 minutes for 5G.

Energy Efficiency: Energy Efficiency needs to be improved to 10% of current consumption! This is not an easy task and requires a new generation of hardware and software systems.

Device Density: Current networks (4G) support up to 1000 devices per sq. km. but for 5G networks the target density is 1 million per sq. km. Before you start asking how you could fit a million devices per sq. km (maybe if everyone is pushed underground because of nuclear war) remember the machine-type communication use-case with hundreds of thousands of sensors in a city centre environment which contains high density business spaces.

Mobility: Everyone loves fiddling with their phones while commuting. But the second you step on a train or enter an underground system things start going bad unless there is train based ‘free wifi’ or mobile signal boosters are installed. 4G works fairly well for up to 100-150 kmph speeds  (e.g. local trains) but fails badly as you get on the super-fast trains (e.g. intercity express trains that run above 200 kmph). 5G networks are being designed to support ground transport speeds of 500 kmph.

Peak Data Rates: This is a fairly straight forward one, 4G supports 100 Mbps and 5G aims to kick this up to 10Gbps. The one thing to note is that this is an ‘up to’ measure so not all users or locations will get the max (up to) limit. In urban locations a guaranteed minimum speed of 50 Mbps has been promised.

High Data Density: To support large number of human and machine hosts concentrated in a small area the 4G data density of 10 Gb/s/sq. km needs to increase to 10 Tb/s/sq. km.

Universal Provisioning: To improve services in rural or so-called low ARPU (Average Revenue Per User/Unit) areas. This is very important not just for the people living in rural communities but also for service continuity (e.g. while driving on a highway). The challenge is to find a trade-off between financial viability and service offerings in rural locations. 5G networks aim to solve this problem.

House Price and Transactions with UK Elections

We are just getting over the not so shocking election result in UK (8th June 2017).

I wanted to look at house prices and how they are affected by election results.

The graphs below plot House Price/ Number of Transactions against date (blue dots). The data is averaged over a month and is normalised to 1.0.

The vertical lines represent UK general elections with blue representing clear results (clear majority) and black lines representing hung Parliament. There is a black line (2nd from right) that represents EU Referendum (‘Brexit’).

The orange dots represent GBP (Sterling) performing against INR (Indian Rupee) and CNY (Chinese Yuan). The data is daily average normalised to 1.0.

We can see house prices grow aggressively after clear results. The period from 2008 onward is the ‘financial’ crisis era which is further complicated by a hung Parliament in 2010. The actual recovery takes a few years and by 2014 the boom times are back! The growth is further enhanced by a Conservative majority in 2015.

It is too early to see the impact of Brexit on the housing market but as far as GBP goes there has been a fall against all major currencies.

This means investment into the UK housing market is made cheaper for ‘international’ buyers. The growth in house prices is compensated by the fall in the pound (we can see this by the relative falls in the two graphs).

Already the house price increase is cooling off (falling in many regions where they were over-inflated to begin with). With the messy general election of 2017 increasing the uncertainty, especially around Brexit, the house prices from internal demand should decrease or flatten out. We can already see this starting. People might rush in to lock their mortgage (thereby boosting short term demand) as Bank of England has indicated a rise in Interest Rates in the near future.

What happens if look at the number of transactions? The normalised graph  above shows that during the financial crisis era the transactions fell sharply. Then began to revive (correlates with the rise in house prices). The strong position of the Conservatives further supported the market.

But as soon as the Stamp Duty increase came into the picture the number of transactions started reducing and after ‘Brexit’ leading up to the 2017 General Election we can see a sharp fall in transactions.

All of these things indicate that people are not sure about what will happen in the future so are not willing to take positions of risk.

Stamp duty change

Stamp duty change (1st April 2016)

A final interesting titbit – Why is there a massive spike in transactions in a subdued period of house sales (the red arrow)? And no this is not an error! The month is March 2016 – and the spike is there because stamp duty changes were being introduced from 1st April 2016 which meant buying a second home (without selling the first one) would become a lot more expensive!

[This analysis uses the Land Registry data set which is processed using Apache Spark, Python was used to further process and plot the data]

Tic Tac Toe: Statistical Analysis with arbitrary Grid Size

The game of tic tac toe (naughts and crosses) is a very simple yet interesting game.

If you don’t know anything about the game visit this link first: https://en.wikipedia.org/wiki/Tic-tac-toe

Some terms I will be using in this post:

  • Grid – the x by x square grid used to play the game, most common size is 3 by 3
  • Grid Reference – the reference to each entry (or slot) in the Grid, for a x by x grid there will be x-squared grid references, therefore for 3 by 3 there are 9 references and for 4 by 4 there are 16
  • Grid Value – each grid reference (or slot) can be empty, ‘x’ or ‘o’
  • Move Sequence – a complete set of moves that define a game, for a Grid Reference length of 9 (3 by 3 grid) there are Factorial(9) set of Move Sequences, a valid example of a move sequence for 3 by 3 grid:
    [1, 3, 4, 6, 5, 7, 9, 8, 2] : alternate moves by player 1 and 2

    Each player takes a symbol and sticks with it (‘x’ or ‘o’)In my post Player 1 always uses ‘o’ and Player 2 ‘x’, this choice has no impact on the final result

  • Set of Move Sequences – a set of move sequences, these can be the full set (all possible moves) for grids of size 3 by 3 (2 by 2 is a trivial case as we shall see). As we move to larger grids the total set of possible moves increases at a Factorial rate therefore we can only sample from the full set of move sequences.
    Grid size, Number of Grid Slots, Total number of moves
    2x2, 4 slots, 24 move sequences
    3x3, 9 slots, 362880 move sequences
    4x4, 16 slots,  >20 trillion move sequences

The interesting thing about regular tic tac toe is that it is deceptively simple as a game with straight forward rules, but can throw some surprising results. The other interesting point is the one related to the rapidly exploding set of possible move sequences.

The Target

How do we analyze games, develop strategies and generally have some fun with this ‘friendly’ problem? Do the strategies remain the same as we scale up the grid?

Approach

I built a tic tac toe simulator to generate move sequences and process them to analyse the final result using Python.

I also used MatplotLib for plotting results and AnyTree for generating brute force move sequences. It is very easy to extend the code and incorporate optimum strategy results to build a tic tac toe bot.

The Python file is attached at the end of this post.

Data Collection

The process of collecting data is very simple.

A set of move sequences is generated (brute force for 3 by 3 grid will generate the complete set of move sequences) by sampling without replacement (size: 10000) from all possible move sequences.

Sampling

The sampling is carried out using a Markov Chain – starting with an initial move sequence (a list of grid references or slots – for 3 by 3 this will be [1, 2, 3, 4, 5, 6, 7, 8, 9]) we (with uniform probability) pick any two moves and switch them. Thus the next state depends only on the previous state.

As an example:

Pick the first and last move to switch.

Sequence at Step t: [1, 3, 4, 6, 5, 7, 9, 8, 2]
Sequence at Step t+1: [2, 3, 4, 6, 5, 7, 9, 8, 1]

The sample size for Grids more than 3 by 3 is 10000.

Using the sampling we can estimate the following given arbitrary grid size:

  • Is there any advantage in starting first?
  • How many games end in a draw or have second movers as winners?
  • Which are the ‘best’ starting squares?
  • How are these factors affected by grid size?

Results

I experimented with grids of size 3 by 3, 4 by 4, 5 by 5 and 6 by 6 (here we are going beyond the Age of the Universe w.r.t. number of move sequences!) and pulled out some interesting statistics!

3 by 3 Grid:

Here the first mover has a CLEAR advantage. It is not surprising because the first mover gets 5 moves as compared to the second user. But the advantage is QUITE SIGNIFICANT.

First Mover Advantage:

  1. The First mover wins 58.4 % of the games (using brute force: 58.492%)
  2. The Second mover wins 28.8% of the games (using brute force: 28.809%)
  3. There are 12.7% instances of a Draw (using brute force: 12.698%)

2 by 2 Grid (An Interlude)

If you are wondering about the 2 by 2 grid (and why we called it a trivial case) then try out a few simple games of 2 by 2 tic tac toe as a thought experiment. In case you are impatient the 2 by 2 grid becomes a trivial case because there is no possibility of a draw or the second mover winning. The first mover wins in all 24 move sequences! You could brute force it on a piece of paper.

3 by 3 Grid Plots

The graph below shows a 3 by 3 grid histogram of winning percentages for First Movers (Orange), Second Movers (Blue) and Draws (Green).

3×3 Grid: First movers in Orange, Second Movers in Blue, Draws in Green

The above histogram clearly shows the difference between the three.

Strategy for Tic Tac Toe!

For the first movers the best starting square by far is 5 (right in the middle and opens up 4 winning lines – most of any starting square, all other squares open up to 3!). Second best option then becomes the corners (1, 3, 7, 9).

For the second movers the best starting squares are 2, 4, 6, or 8 (a diamond pattern).

4 by 4 Grid:

When we increase the size to a 4 by 4 grid we get 16 slots with > 20 trillion move sequences. It is difficult to analyse this (and probably pointless) using the brute force approach. To do this quickly we can use the sampling approach where we randomly sample move sequences (as described before) and measure how many of these lead to draws, first mover wins and second mover wins. This will obviously be an approximate measure.

The result is presented in the histogram below. Once again First movers in Orange, Second Movers in Blue and Draws in Green.

This time we see that there is high chance of a draw (~ 42%), first movers still win more than second movers but the difference is less dramatic (~ 31% for first movers and ~ 26% for second movers). We also see that the first and second mover distributions are starting to overlap at the edges.

4×4 Grid: First movers in Orange, Second Movers in Blue, Draws in Green

5 by 5 Grid:

For a larger 5 by 5 Grid it is impossible to do a brute force calculation as the full number of move sequences reach a stunning Factorial 25!

Using the same sampling approach we can again plot the percentages as a histogram.

As expected the chance to draw has now increased to ~60%! First movers win only about ~25% of the time where as second movers win only about ~13% of the time.

5×5 Grid: First movers in Orange, Second Movers in Blue, Draws in Green

6 by 6 Grid:

Finally doing the same for a 6 by 6 Grid we find that there is a ~75% chance of a draw! There is also almost no difference between first movers and second movers (or very little difference) and both are in the 12 – 15% range. Thus as we increase the grid size it becomes more difficult to win and there is no real advantage in moving first.

6×6 Grid: First movers in Orange, Second Movers in Blue, Draws in Green

Next Steps:

Obviously the experiments given in this post are not the final word. They can be further improved by implementing parallel processing so that we can increase the sample size (especially for larger grids).

To answer the questions we started with:

  • Is there any advantage in starting first? – Yes, definitely! Right up till grid size of 5 by 5.
  • How many games end in a draw or have second movers as winners? – More games end in a draw as the grid size increases. It is almost always bad to be the second mover!
  • Which are the ‘best’ starting squares? – For the 3 by 3 Grid we have:
    • For the first movers the best starting square by far is 5 (right in the middle and opens up 4 winning lines – most of any starting square, all other squares open up to 3!). Second best option then becomes the corners (1, 3, 7, 9)
    • For the second movers the best starting squares are 2, 4, 6, or 8 (a diamond pattern).
  • How are these factors affected by grid size? – As the grid size increases a draw becomes more likely.

Larger Sample Sizes:

What if we increase the sample size from 10000 to 100000 (a factor of 10)?

Let us see what happens to the spread of the first movers, second movers and draws in case of 5 by 5 Grid and 4 by 4 Grid:

Grid Width 5×5: Large Sample, First movers in Orange, Second Movers in Blue, Draws in Green

Grid Width 4×4: Large Sample, First movers in Orange, Second Movers in Blue, Draws in Green

The spread is now far less as compared to the 10000 sample graph. This means we get the percentage values with higher degree of accuracy. The spread looks more like Burj Khalifa than the Shard!

 

Tic Tac Toe python code file can be found HERE!

Raspberry Pi Cluster and Apache Spark!

So over the Christmas holidays I have been busy playing with my 4 x Raspberry Pi 3 (Model B) units which I have assembled into a stack. They each have a 16 GB Memory Card with Raspbian.

Spark Pi Cluster

Spark Pi Cluster

The Spark Master is running on a NUC (the Spark driver program runs there or I simply use the ‘spark-shell’).

If you want to make your own cluster here is what you will need:

  • Raspberry Pi 3 Model B (I bought 4 of them – just the Pi’s – don’t bother with the ‘Kit’ because you won’t need the individual cases or power supplies).
  • Rapbian on a Memory Card (16GB will work fine) for each Pi.
  • A stacking plate set (one per Raspberry Pi to mount it) and one pair of ‘end plates’. This acts as a ‘rack’ for your Pi cluster. It also makes sure your Pi boards get enough ventilation and you can place the whole set neatly in a corner instead of having them lying around on the dining table!
  • Multi-device USB power supply (I would suggest Anker 60W PowerPort with 6 USB ports – which can support up to 6 Pi 3’s) so that you end up with one power plug instead of one plug per Pi.
  • To connect the Pi boards to the Internet (and to each other – for the Spark cluster) you will need a multi-port Gigabit switch – I would suggest buying one with at least 8 ports as you will need 1 port per Pi and 1 port to connect to your existing network.
  • A wireless keyboard-trackpad to setup each Pi (just once per Pi).
  • A single HDMI cable to connect with a TV/Monitor (just once per Pi).

Setting up the Pi boards:

Once you have assembled the rack and mounted the boards, install the memory cards on all the boards and connect them to the power supply and the network. Wait for the Pi boards to boot up.

Then one Pi at a time:

  • Connect a keyboard, mouse and monitor – ensure the Pi is working properly then:
    • Set hostname
    • Disable Wireless LAN (as you have Ethernet connectivity- which is more stable)
    • Check SSH works – this will make sure you can remotely work on the Pi

Raspberry Pi Cluster Image

Once all that is done and you can SSH into the Pi boards – time to install Spark:

Again one Pi at a time:

  • SSH into the Pi and use curl -o <spark download url> to download Spark tar.gz
  • tar -xvf <spark tar.gz file> to unzip the tar.gz to a standard location (I use ‘/spark/’ on all the Pi boards)
  • Make sure correct permissions are assigned to the spark folder
  • Add the master machine hostname to the /etc/hosts file
  • Edit your ~/.bashrc and add the following: export SPARK_HOME = <the standard location for your spark>

Similarly install Spark on a node which you will use as the ‘spark cluster master’ – use the same standard location.

Start up master using the spark ‘start-master.sh’ script. If you go to http://<IP of the Master Node>:8080/ you should see the Spark webpage with the status of the Workers (empty to start with) and various other bits of useful information such as the spark master URL (which we will need for the next step), number of available CPUs and application information. The last item – application information – is particularly useful to track running applications.

SSH into each of the Pi boards and execute the following: ‘start-slave.sh spark://<IP of the Master Node>:7077’ to convert each Pi board into a Spark slave.

Now if you look at the Spark webpage you will see each of the Slave nodes up (give it a couple of minutes) and you will also see the cluster resources available to you. If you have 4 Pi boards as slaves you will see 4 * 4 = 16 Cores and 4 * 1 GB = 4 GB Memory available.

Running Spark applications:

There are two main things to remember when running a Spark application:

  1. All the code that you are running should be available to ALL the nodes in your cluster (including the master)
  2. All the data that you are using should be available to ALL the nodes in your cluster (including the master).

For the code – you can easily package it up in the appropriate format (language dependent – I used Java so I used Maven to build a JAR with dependencies) and network share a folder. This reference can be used when using the spark-submit command (as the location of the application package).

For the data – you have two options – either use a network share as for the code or copy the data to the SAME location on ALL the nodes (including the master). So for example if on the master you create a local copy of the data at ‘/spark/data’ then you must use the SAME location on all the Pi boards! A local copy is definitely required if you are dealing with large data files.

Some tests:

For my test I used a 4 GB data file (text-csv) and a simple Spark program (using ‘spark-shell’) to load the text file and do a line count.

1: Pi Cluster (4 x Raspberry Pi 3 Model B)

  • Pi with Network shared data file: > 6 minutes (not good at all – this is just a simple line count!)
  • Pi with local copies of the data file: ~ 51 seconds (massive difference by making the data local to the node)

2: Spark standalone running on my laptop (i7 6th Gen., 5600 RPM HDD, SATA3 SSD, 16 GB RAM)

  • Local data file on HDD: > 1 min 30 seconds (worse than a Pi cluster with locally copied data file)
  • Local data file on SSD: ~ 20 seconds (massive difference due to the raw speed of the SSD!)

Conclusion (Breaking the Cluster):

I did manage to kill the cluster! I setup a more complicated data pipeline which does grouping and calculations using the 4 GB data file. It runs within 5 mins on my laptop (Spark local). The cluster collapsed after processing about 50%. I am not sure if the issue was related to the network (as a bottleneck) or just the Pi not able to take the load. The total file size is greater than the total available memory in the cluster (some RAM is required for the local OS as well).

So my Spark cluster is not going to break any records, in fact I would be better off using a Spark standalone on my laptop  if it is a one-shot (i.e. process large data file and store the results somewhere).

Things get interesting if we had to do this once every few hours and we could automate the ‘local data copy’ step – which should be fairly easy to do. The other option is to create a fast network share (e.g. using SSDs).

What next:

Some nice project which would suit the capabilities of a Pi cluster? Periodic data processing/stream processing task? Node.JS Servers? Please comment and let me know!

Pokemon Go! Evolve vs Transfer

Each type of Pokemon needs certain number of candies (of a compatible type) to evolve to the next level. Usually you need  either 12, 25, 50, 100 or 400 (Magikarp) to evolve to the next level.

The exact number depends on the Type of the Pokemon as well as its current evolution level. For example Pidgey to Pidgeotto requires 12 Pidgey candies where as Pidgeotto to Pidgeot requies 50.

When looking to evolve Pokemon we often need to ‘transfer’ a few back to the Professor to earn candies before we have enough for the evolution. This is especially true in two cases:

a) Uncommon types (dependent on location etc.): where you will end up having far larger number of that type than you will be able to utilize for evolving. For example in our area there are very few Machop, and for the first level you need 25 Machop candies. Thus I will need to catch 9 Machop before I can evolve Machop! But if I was to transfer I could evolve after catching 7 (giving me 7*3 = 21 Machop candies) and then transferring 4 (giving me 4 Machop candies).

b) Very common types (to maximise evolutions especially if you have a lucky egg activated): where if you have a few hundred Pidgeys (again far few that you can evolve).

In both cases you need a way to calculate, given the current number of a particular type (e.g. for a Pidgey and Pidgeotto are different types even though they are part of the same evolution chain), the number of candies available and the number of candies per evolve – how many extra evolutions you can have by transferring some Pokemon.

The formula is:

ToInteger[(Nt + C) / (1 + Co)] – Nc = Ne

Nt = Number of currently present Type

C = Number of currently available Candies

Co = Number of Candies required for next Evolve

Nc = Number of possible Evolves without Transferring

Ne = Number of extra evolutions possible by transferring Pokemon

For example:

Let us assume you have 103 (C) Eevee candies. Now each evolution of Eevee (which has only a single level) requires 25 (Co) Eevee candies. Let us assume we have 30 (Nt) Eevee with us.

This gives:

Nc = ToInteger(103/25) = 4

ToInteger[(Nt + C) / (1 + Co)] = ToInteger[5.11] = 5, thus Ne = 5 – 4 then Ne = 1

Which means we can return Eevees to get one additional evolve!

 

Now we need to find out exactly how many Eevees we need to return to achieve that one additional evolve – while making optimal use of existing Eevee candies. The so called Equilibrium condition is that we have no un-evolved Eevees or unused Eevee candies after the evolutions.

The formula for Number of Returns (Nr):

Nr = [(Ne + Nc)*Co] – C

From the example above we have: Ne = 1, Nc = 4, Co = 25 and C = 103, which gives:

Nr = [(1+4)*25] – 103 = 125 – 103 = 22

Thus to make optimal use of existing Eevee and Eevee candies we should transfer 22 out of 30 Eevees and utilize the candies gained from transfer to evolve the remaining Eevees.

The result is not at Equilibrium because we will be left with 3 Eevees after we return 22 and evolve 5 [30 – (22+5) = 3].

Enjoy!

Make Your Own IoT Twitter Bot

Internet of Things (IoT) is one of the big buzzwords making the rounds these days.

The basic concept is to have lightweight computing platforms integrated with everyday devices turning them into ‘smart’ devices. For example take your good old electricity meter embed a computing platform on it and connect it to the Internet – you get a Smart Meter!

Basic ingredients of an IoT ‘device’ include:

  • A data source, usually a sensor (e.g. temperature sensor, electricity meter, camera)
  • A lightweight computing platform with:
    • low power requirements
    • network connectivity (wireless and/or wired)
    • OS to run apps
  • Power connection
  • Data connection (mobile data, ethernet or WiFi)

In this post I wanted to build a basic IoT sensor using an off-the-shelf computing platform to show how easy it is!

This is also to encourage people to do their own IoT projects!

Raspberry Pi and Tessel2 platforms are two obvious choices for the computing platform.

I decided to use Tessel2 which is lot less powerful than Pi (sort of like comparing a Ford Focus with a Ferrari F40).

Tessel2 has a 580MHz processor, 64MB RAM and 32MB flash (just for comparison Pi3B has a Quad Core 1.2GHz processor, 1GB RAM) – both have Wifi and Ethernet built in.

Pi comes with a Debian based OS with full desktop-class capabilities (GUI, Applications etc.) where as Tessel2 just supports Node.JS based apps (it is running Open WRT) and has no GUI capabilities. Therefore it is lot closer to a IoT platform in terms of capabilities, power requirements and form factor.

Temperature Tweeter Architecture

Temperature Tweeter Architecture

Architecture

Computing Platform: Tessel2

Tessel2 has a set of basic hardware features which includes USB2.0 ports, sensor sockets (where you can plug in different modules such as temperature, GPS, bluetooth) and one Ethernet socket.

Since there is no UI for the Tessel2 OS you have to install the ‘t2’ command line tool to interact with your tessel.

The Tessel2 website has an excellent ‘first steps’ section here.

If you are blessed with a Windows 10 based system you might have some issues with detecting the Tessel2. One solution is to install ‘generic USB drivers’ here. But Google is your friend in case you run into the dreaded: ‘Detected a Tessel that is booting’ message.

Data Source: Climate Sensor Module

The sensor module we use as a data source for this example is the climate sensor which gives the ambient temperature and the relative humidity. The sensor module can be purchased separately and you can connect up to two modules at a time.

Power and Data:

As the sensor is based indoors we use a standard micro-USB power supply. For external use we can use a power bank. The data connection is provided through a wired connection (Ethernet) – again as we are indoors.

The Node.JS Application

Start by creating a folder for your Tessel2 app and initialise the project by using the ‘t2 init’ command within that folder.

Create the node.js app to read data from the sensor and then use the ‘twitter’ api to create a tweet with the data. The application is really simple but shows off the power of Node.JS and the large ecosystem of libraries available for it.

One good thing about the Tessel2 is that because it is such a lightweight platform you really cannot run fulll sized Node.JS apps on it. As a comparison, a single Node.JS instance can use up to 1.8GB of RAM on a 64-bit machine where as Tessel2 has only 64MB RAM in total for everything that is running on it!

Most common type of applications that you will find yourself writing, in the IoT space, will involve reading some value from a sensor or attached device then either exposing it via a REST server running on Tessel2 itself (pull) or by calling a remote server to write the data (push).

In other words you will just end up writing pipelines to run on Tessel2 which read from a source and write to a destination. You can also provide support for cross cutting concerns such as logging, authentication and remote access.

If you want to Tweet the data then you will need to register a Twitter account and create a ‘Twitter app’ from your account settings. All the keys and secrets are then generated for you. The Twitter API for Node.JS is really easy to use as well. All the info is here.

The implementation can be found here: https://twitter.com/machwe_bot

Few Pointers

Don’t put any complex calculations, data processing or analytics functionality in the pipeline if possible. The idea is also that IoT devices should be deploy and forget.

Be careful of the libraries you use. Certain objects such as ‘clients’ and ‘responses’ can be quite large considering the fact that you only have less than 64MB of RAM to play around with. So you might want to run the program locally and profile the memory use just to be sure.

‘t2 run’ command allows you to test your program on the tessel2 while getting console output to your terminal. This is an excellent way of testing your programs. Once you are ready to ‘deploy and forget’ your Tessel2 just use the ‘t2 push’ command to load your Node.JS app on the device. Thereafter every time the device restarts it will launch your app.

Code

This is the code for the ‘Climate Tweeter’:

‘npm install’ will get you all the imports.

var Twitter = require('twitter');
 
var tessel = require('tessel');
var climatelib = require('climate-si7020');
 
// Init Climate Module
var climate = climatelib.use(tessel.port['B']);
 
var data = {
  consumer_key: 'your consumer key',
  consumer_secret: 'your consumer secret',
  access_token_key: 'your access token key',
  access_token_secret: 'your access token secret'
};
 
var client = (new Twitter(data));
 
setInterval(function(){
    if (climate_status) {
        // Read the Temperature and Humidity
        climate.readTemperature('c', function (err, temp) {
          climate.readHumidity(function (err, humid) {
 
            // Output Tweet
            var output = (new Date())+',Bristol UK,Home,Temp(C):'+ (temp.toFixed(2)-5) + ', Humidity(%RH):'+ humid.toFixed(2);
 
            //Tweet to Twitter
            client.post('statuses/update', {status : output}, function(error, tweet, response) {
                    if (error) {												
                        console.error("Error: ",error);
                    }                                                                                                                         
            });     
        });
    });
}},600000);
 
climate.on('ready', function () {
  console.log('Connected to climate module');
  // Climate module on and working - we can start reading data from it
  climate_status = true;
});

What next?

The interesting thing is that I did not need anything external to the Tessel2 to make this work. I did not have to setup any servers etc. I can very easily convert the device to work outdoors as well. I could hook up a camera (via USB) to make this a ‘live’ webcam, attach a GPS and mobile data module with a power pack (for backup) and connect it to your car (via the power port or lighter) – you have a car tracking device.

Enjoy!