function [p mean1 mean2]=Pval2MeansPermutation(samp1,samp2,N)
if nargin<3 N=100000; end
% Populations in two categories cat1 and cat2 have samples taken and numerical variable values found, producing samp1 of size n from cat1 and samp2 of size m from samp2.
% We want to decide if a parameter---in this case the mean of this numerical variable---differs in the two categories.
% We calculate p(1) which is a Monte Carlo estimate (N random permutations are selected rather than all of them) for the probability of getting abs(mean(samp1)-mean(samp2)) as large as we actually got or larger, assuming the null hypothesis: that the distributions of the numerical variables in the two categories are the same, and any variation in estimate of parameters is due to chance.
% We also calculate p(2) which estimates the probability of getting the samples we got, or samples for which mean(samp1)-mean(samp2) is even larger than our differences, assuming the null hypothesis: that the distributions of the variable values in the two categories are the same, and selections are random.
% NOTE: We would never run this second (one-sided) test unless mean(samp1) actually exceeds mean(samp2): we are trying to find here strong evidence that the mean on cat1 is larger than the mean on cat2, and unless our samples reflect this they provide no evidence for this at all and the null hypothesis would never be rejected.
% This function takes N randomly permuted samples of the combined original numerical data and counts the fraction of cases for which the mean of the first n samples differs from the mean of the last m samples by as much or more than the differences of means of samp1 and samp2.
% If these probabilities are very small it is unlikely that the null hypothesis is true and provides evidence in favor of the alternative.
% If samp1 and samp2 are small we could actually list all possible permutations and calculate the exact p(1) and p(2) directly. Our random selection of permutations will give almost identical results for typical (large) N values.
% If m=n then p(1) should be almost exactly twice p(2). Any difference from that is due to Monte Carlo variation: a few more of one type of permutation was, by chance, selected than the other.
% samp2=[65,79,90,75,61,85,98,80,97,75];
% samp1=[90,98,73,79,84,81,98,90,83,88,102,99,53];
mean1=mean(samp1)
mean2=mean(samp2)
if mean(samp2)>mean(samp1) error('Examine your data. The mean of the first sample should exceed the mean of the second to run this test. If you are running a 2-sided test the order is irrelevant, so switch the order. But if you are running a 1-sided test we must have the first mean exceeding the second and your alternative hypothesis must agree that this SHOULD be true. There is no need to run a 1-sided test at all if this cannot be done: the data you collected will not support your alternative hypothesis and the null hypothesis will never be rejected.'), end
rng('shuffle') %assign time-based seed to random number generator
n=length(samp1);
if size(samp1)==[n 1] samp1=samp1';end
m=length(samp2);
if size(samp2)==[m 1] samp2=samp2';end
data=[samp1 samp2]; %concatinate row data vectors
counttwosided=0;
countgreater=0;
for i=1:N
choose=randperm(n+m); %randomize indices
perm1=data(choose(1:n));
perm2=data(choose(n+1:n+m));
diff12(i)=mean(perm1)-mean(perm2);
if abs(diff12(i))>=abs(mean1-mean2)
counttwosided=counttwosided+1;
end;
if diff12(i)>=mean1-mean2
countgreater=countgreater+1;
end;
end;
p=[(counttwosided+1)/(N+1),(countgreater+1)/(N+1)];
% (1 is added because the original samples automatically satisfy both conditions so we never want a probability of 0 here, which could happen if the original sample were disregarded.)
figure(88);
histogram(diff12,100);
hold on
plot([-(mean1-mean2) -(mean1-mean2) (mean1-mean2) (mean1-mean2)],[0 N/40 N/40 0])
hold off
end