Thursday, November 17, 2011

Areal B-Splines: Interpolating the area under a curve with smooth B-splines in Matlab

B-splines are great for smooth interpolation between points. But what if the "points" you have are not actually points, but are the area of segments under the curve?

Say for example you have a time series with oil production in 2011, on a monthly basis. From this information, you would like to generate a smooth time series that gives you the daily oil production. This is an interpolation of sorts, but not quite a standard one - what you want is for the sum of the interpolated daily oil production to add up to the monthly production: in other words, the area under the curve of instantaneous production must be equal to the monthly production that is given.

It is also an interesting technique to use when looking at forward curves, and trying to guess how the monthly price may break down into daily spot prices. For instance, is February is traded higher than January and December is traded lower than January, you will assume that if each day in January had a price, the daily price would be higher towards the end of January and lower at the start of January. You know, instead of assuming the same price over the whole month?

Does that make sense? For a bit more details on the technique, have a look at this paper.

We use a quartic spline (fourth-degree piecewise polynomial with value, first derivative and second derivative which are equal at the knots), a maximum smoothness criterion, and boundary conditions.

The graph below is created by calling the "smartSplit" function like this:
 out_values = smartSplit([0.0 15.0 20.0 26.0 30.0 31.0], [100.0, 25.0, 50.0 14.0 5.0], 0:0.05:31)  



Isn't the graph nice? The blue line represents the input to the function, if we just pasted the average of the integral over the entire interval. The green line is our interpolated line, which has the same integral as the blue curve, but has beautiful spline properties such as continuity and continuity of derivatives. In addition, it has a maximum smoothness condition.

Here is the main function in nice and simple Matlab code:

 function [piecewise_polynomial] = arealSplineInterpolate(x_values, values)   
    %% arealSplineInterpolate   
    % Creates a spline that has given integral/area under each interval of    
    % a curve.   
    %    
    % Input:   
    % - x_values: the intervals on the x-axis of the dataset to be interpolated.   
    %    Values are in increasing order. The size should be one larger   
    %    than the "values" parameter, and each two consecutive values   
    %    x_values(i) and x_values(i+1) give the interval over which   
    %    values(i) is the integral/area under the curve.   
    %    For example: [0.0 15.0 20.0 26.0]   
    % - value: the area under the curve of each interval.   
    %   For example: [100.0, 25.0, 50.0]   
    %   
    % Output:   
    % - y_values_out: the values when the resulting interpolating function is   
    %     evaluated at x_values_out.   
    % Use shorthand names   
    t = x_values;   
    m = values;   
    n = length(t);   
    a_rows = 4*(n-1);   
    a_columns = 5*(n-1);   
    A = zeros(a_rows, a_columns);   
    b = zeros(a_rows, 1);   
    % Boundary conditions   
    A(1,1) = 0;   
    A(1,2) = 0;   
    A(1,3) = 2;   
    b(1) = 0;   
    A(2,end-4) = 12*(t(end)-t(end-1))^2;   
    A(2,end-3) = 6*(t(end)-t(end-1));   
    A(2,end-2) = 2;   
    b(2) = 0;   
    A(3,1) = 0;   
    A(3,2) = 0;   
    A(3,3) = 0;   
    A(3,4) = 1;   
    b(3) = 0;   
    initial_conditions = 3;   
    % Integrals of the resulting curve must hit the given values   
    for i=1:n-1   
      row = initial_conditions + i;   
      h = t(i+1) - t(i);   
      A(row,5*i-4) = (1/5) * (h^5);   
      A(row,5*i-3) = (1/4) * (h^4);   
      A(row,5*i-2) = (1/3) * (h^3);   
      A(row,5*i-1) = (1/2) * (h^2);   
      A(row,5*i ) = h;   
      b(row) = m(i);  
    end   
    % Curve must be continuous, i.e. segments meet at knots   
    for i=1:n-2   
      row = initial_conditions + (n-1) + i;   
      h = t(i+1) - t(i);   
      A(row,5*i-4) = h^4;   
      A(row,5*i-3) = h^3;   
      A(row,5*i-2) = h^2;   
      A(row,5*i-1) = h;   
      A(row,5*i ) = 1;   
      A(row,5*(i+1)) = -1;   
      b(row) = 0;   
    end   
    % Curve must have continuous first-derivative at knots   
    for i=1:n-2   
      row = initial_conditions + (n-1) + (n-2) + i;   
      h = t(i+1) - t(i);   
      A(row,5*i-4) = 4*h^3;   
      A(row,5*i-3) = 3*h^2;   
      A(row,5*i-2) = 2*h;   
      A(row,5*i-1) = 1;   
      A(row,5*(i+1)-1) = -1;   
      b(row) = 0;   
    end   
    % Curve must have continuous second-derivatives at knots   
    for i=1:n-2   
      row = initial_conditions + (n-1) + 2*(n-2) + i;   
      h = t(i+1) - t(i);   
      A(row,5*i-4) = 12*h^2;   
      A(row,5*i-3) = 6*h;   
      A(row,5*i-2) = 2;   
      A(row,5*(i+1)-2) = -2;   
      b(row) = 0;   
    end   
    % Maximum smoothness condition   
    h_mat = zeros(5,5);   
    big_h = zeros(5*(n-1), 5*(n-1));   
    for i=1:n-1   
      h = t(i+1) - t(i);   
      h_mat(1,1) = (144/5)*h^5;   
      h_mat(1,2) = 18*h^4;   
      h_mat(1,3) = 8*h^3;   
      h_mat(2,1) = 18*h^4;   
      h_mat(2,2) = 12*h^3;   
      h_mat(2,3) = 6*h^2;   
      h_mat(3,1) = 8*h^3;   
      h_mat(3,2) = 6*h^2;   
      h_mat(3,3) = 4*h;   
      big_h(5*(i-1)+1:5*i,5*(i-1)+1:5*i) = h_mat;   
    end   
    % Solve the resulting equation   
    A_t = transpose(A);   
    zm = zeros(size(A,1), size(A_t,2));   
    pm = [eye(5*(n-1)) zeros(5*(n-1),4*(n-1))];   
    dm = inv([2*big_h A_t; A zm]);   
    bm = [ zeros(size(pm*dm,2)-size(b,1),1); b];   
    x = pm*dm*bm;   
    % Make a matlab piecewise polynomial   
    cnt = 1;   
    x2 = zeros(n-1, 5);   
    for i=1:n-1   
      x2(i,:) = x(5*i-4:5*i)';   
    end   
    piecewise_polynomial = mkpp(t,x2);   
  end   


And a simple utility function for it, which produces the graph above:
  function [out_values] = smartSplit(x_intervals, values, x_out_intervals)   
    %% smartSplit   
    % Uses the areal spline interpolation method to subdivide a list of   
    % known intervals and values into another.   
    pp = arealSplineInterpolate(x_intervals, values);   
    flc = @(x) ppval(x,pp);  
    out_values = [];   
    for i=1:length(x_out_intervals)-1  
      out_values(i) = quad(flc,x_out_intervals(i),x_out_intervals(i+1));   
    end   
    % Show a plot  
    clf()  
    hold on;  
    b1 = values./(x_intervals(2:end)-x_intervals(1:end-1));  
    b2 = out_values./(x_out_intervals(2:end)-x_out_intervals(1:end-1));  
    stairs(x_intervals,[b1 b1(end)], 'LineWidth',2)  
    stairs(x_out_intervals,[b2 b2(end)], 'g-','LineWidth',2);  
    hold off;     
  end   

Let me know in the comment fields if you have any questions, or use this for something interesting!