% LU Addendum

I was unhappy with my LUNaive script which is not at all careful about flop-count, although from an "idea" standpoint it is quite simple. It has flopcount around 10/3*n^3, about 5 times worse than the (reported) best possible result. So I went out on the internet and found a better algorithm, but was STILL getting flopcounts close to n^3.

So I wrote the following script, which is a self-calling function. It creates the first row of U and the first column of L. It then reduces the target matrix to a smaller matrix and calls itself on that.

It has the optimum reported 2/3*n^3 flopcount behavior.

It is trivial to count the steps in my script directly, which is why I wrote it this way.

The number of steps in each function call is easily seen to be

n+(n-1)*2(n-1) 

where n is the size of the matrix at this step. Each function call reduces the size by 1 so if we sum this from n=2 to n=k we get the flopcount. Using summation identities we calculate the exact count to be

n*(n+1)/2+n*(n-1)*(2*n-1)/3 -1.

%***********************************************************
function [M,L,U] = LUNaive3(M,L,U)
% LUNaive: Create an LU decomposition via naive Gauss elimination of matrix M.
% Invoke it using a single square matrix argument, as LUNaive3(M).
% This will differ from the MATLAB lu function in that it doesn't use pivoting
% Because of this the script will break if it encounters 
% a zero in a pivot position.
% input:
% M = coefficient matrix
% output:
% L = Lower Triangular left factor
% U = Upper Triangular right factor

[m,n] = size(M);

if m~=n, error('Matrix A must be square'); end
if nargin<2,L=eye(size(M)); end
if nargin<2,U=zeros(size(M)); end

lsize=length(L);

L1=L(lsize-n+1:lsize,lsize-n+1:lsize);
U1=U(lsize-n+1:lsize,lsize-n+1:lsize);

A=M;

c=A(:,1)/A(1,1);

L1(2:n,1)=c(2:n);

U1(1,:)=A(1,:);
for k=2:n
U1(k,:)=[0, A(k,2:n)-c(k)*A(1,2:n)];
end

L(lsize-n+1:lsize,lsize-n+1:lsize)=L1;
U(lsize-n+1:lsize,lsize-n+1:lsize)=U1;
M=U(lsize-n+2:lsize,lsize-n+2:lsize);

if n>2 [M,L,U]=LUNaive3(M,L,U);
end


%***********************************************************
%***********************************************************
function [M,L,U,count] = LUNaive3Count(M,L,U,count)
% LUNaive: Create an LU decomposition via naive Gauss elimination for matrix M.
% Invoke it using a single square matrix argument, as LUNaive3Count(M).
% This will differ from the MATLAB lu function in that it doesn't use pivoting
% Because of this the script will break if it encounters 
% a zero in a pivot position.
% input:
% M = coefficient matrix
% output:
% L = Lower Triangular left factor
% U = Upper Triangular right factor

% Check out its behavior by plotting c obtained in the following code.
%  for w=1:50
%  M=rand(w);
% [M,L,U,c(w)]=LUNaive3Count(M)/w^3;
%  end
%  c

[m,n] = size(M);

if m~=n, error('Matrix A must be square'); end
if nargin<2,L=eye(size(M));,end
if nargin<2,U=zeros(size(M));,end
if nargin<4,count=0; end
lsize=length(L);

L1=L(lsize-n+1:lsize,lsize-n+1:lsize);
U1=U(lsize-n+1:lsize,lsize-n+1:lsize);

A=M;

c=A(:,1)/A(1,1);

count=count+n;

L1(2:n,1)=c(2:n);

U1(1,:)=A(1,:);
for k=2:n
U1(k,:)=[0, A(k,2:n)-c(k)*A(1,2:n)];
count=count+2*(n-1);
end

L(lsize-n+1:lsize,lsize-n+1:lsize)=L1;
U(lsize-n+1:lsize,lsize-n+1:lsize)=U1;
M=U(lsize-n+2:lsize,lsize-n+2:lsize);


if n>2 [M,L,U,count]=LUNaive3Count(M,L,U,count);
end

%***********************************************************

% I ran the following:
clear c
for w=4:50
M=rand(w);
[M,L,U,count]=LUNaive3Count(M);
c(w)=count/w^3;
end
c(30:50)
% and got 

 0.6547    0.6550    0.6552    0.6555    0.6557    0.6560    0.6562    0.6564    0.6566    0.6568    0.6570
 
GREAT!!!  However .......

I checked it out with tic toc on a big matrix ....

M=rand(500);
tic
LUNaive(M);
a(1)=toc;
tic
B=LUNaive2(M);
a(2)=toc;
tic
C=LUNaive3(M);
a(3)=toc

and got the surprising

a = 2.3831    0.1516    1.4409

So my min-flop solution was a flop, 10 times slower than the arithmetic-only version which had a 50% higher flopcount!!! 

My recursive solution is profligate in use of RAM, and apparently read/write to RAM adds up too!!

Something to think about.

But ... I'm tired of this problem now. But in case you aren't, here is the function I found somewhere on the interweb that is (in a practical sense) fastest, and the flopcount version of my original LU-maker too.


%***********************************************************
function [L,U] = LUNaive2(A)
% LUNaive: Create an LU decomposition via naive Gauss elimination
% This will differ from the MATLAB lu function in that it doesn't use pivoting
% Because of this the script will break if it encounters 
% a zero in a pivot position.
% input:
% A = coefficient matrix
% output:
% L = Lower Triangular left factor
% U = Upper Triangular right factor

% The outer loop calculates L
% and the second inner loop calculates U

[m,n] = size(A);

if m~=n, error('Matrix A must be square'); end

L = eye(m); 

for k=1:m

    L(k+1:m, k) = A(k+1:m, k)/A(k,k);

    for p = k+1:m
        A(p,:) = A(p,:) - L(p,k)*A(k,:);
    end
    
end

U=A;

end
%***********************************************************
function [c, L,U] = LUNaive2Count(A)
% LUNaive: Create an LU decomposition via naive Gauss elimination
% This will differ from the MATLAB lu function in that it doesn't use pivoting
% Because of this the script will break if it encounters 
% a zero in a pivot position.
% input:
% A = coefficient matrix
% output:
% L = Lower Triangular left factor
% U = Upper Triangular right factor

% Test the order of the flopcount using the following.

%  for w=1:50
%  M=rand(w);
%  c(w)=LUNaive2Count(M)/w^3;
%  end
%  c


[m,n] = size(A);

c=0;

if m~=n, error('Matrix A must be square'); end

L = eye(m); 

for k=1:m

    L(k+1:m, k) = A(k+1:m, k)/A(k,k);

c=c+sum(length(k+1:m));

    for p = k+1:m
        A(p,:) = A(p,:) - L(p,k)*A(k,:);
        
        c=c+2*m;
        
    end
    
end

U=A;

end
%***********************************************************
%***********************************************************
function [c,L,U] = LUNaiveCount(A)
% LUNaive: Create an LU decomposition via naive Gauss elimination
% This will differ from the MATLAB lu function in that it doesn't use pivoting
% Because of this the script will break if it encounters 
% a zero in a pivot position.
% input:
% A = coefficient matrix
% output:
% L = Lower Triangular left factor
% U = Upper Triangular right factor

% The first time through the loop we calculate U
% and the second time through we calculate L
% Essentially, the second loop calculates the inverse of the 
% elementary row operations used to find U.

c=0;

[m,n] = size(A);
if m~=n, error('Matrix A must be square'); end

Aug = [A eye(size(A))];
% forward elimination to find U
for k = 1:n-1
for i = k+1:n

factor = Aug(i,k)/Aug(k,k);
c=c+1;


Aug(i,k:2*n) = Aug(i,k:2*n)-factor*Aug(k,k:2*n);
c=c+2*length(k:2*n);
end
end

U=Aug(:,1:n);

% forward elimination to find L

A=Aug(:,n+1:2*n);

Aug = [A eye(size(A))];
% forward elimination

for k = 1:n-1
for i = k+1:n

factor = Aug(i,k)/Aug(k,k);
c=c+1;

Aug(i,k:2*n) = Aug(i,k:2*n)-factor*Aug(k,k:2*n);
c=c+2*length(k:2*n);
end
end

L=Aug(:,n+1:2*n);

end
 %***********************************************************