#These programs were discussed in the paper (Fraser. IEEE Trans. Inf.
#Theory, 35(2):245-262, March 1989) and are a generalization of the
#program discussed in (Fraser & Swinney. Phys Rev A, 33:1134-1140, Feb. '86.)
#
#The programs are currently (July 1993) available via anonymous ftp from:
# lyapunov.ucsd.edu and
# ftp.phys-chemie.uni-wuerzburg.de
# and Fraser can be contacted at andy@sysc.pdx.edu
#
# This file is a group of files which have been specially packaged
# by bundle, a program on page 98 of Kernighan & Pike. You can read
# the file as it is and if you have a UNIX system you can execute it
# as a shell script and it will unbundle itself into separate files.
# To unbundle, remove everything above this line and then execute
# the file with sh.
#==========================
#= Here is file README =
#==========================
echo README 1>&2
cat >README <<'End of README'
Thank you for the interest you expressed in our code. The letter of which
this file is a part of should contain;
1)README ... This file.
2)polly.1 ... A standard documentation page of the included programs.
3)loopi ... A shell script. See polly.1
4)scode.c ... C program. See polly.1
5)interleave.c ... C program. See polly.1
6)polly.c ... C program. See polly.1
7)loop... An undocumented shell script
This is not exactly the same code used for the Feb. '86 Phys Rev A paper. It is
essentially the code which was suggested in the conclusion of that paper. It
can calculate the redundancy or generalized mutual information of higher
dimensional distributions. For simplicity the statistical test has been
reduced to a single test using only the current level of subdivision, i.e.
it uses eqn. 21 in the paper but does not use eqn. 22.
If you are only interested in the mutual information, call interleave and
polly with the second argument (the d value) set to 2.
End of README
#==========================
#= Here is file polly.1 =
#==========================
echo polly.1 1>&2
cat >polly.1 <<'End of polly.1'
POLLY(1) (Center for Nonlinear Dynamics U.T. Austin)
NAME
polly - estimate total redundancy of input
interleave - prepare input for polly
scode - prepare input for interleave
loopi - interactive demonstration
SYNTAX
polly n d
interleave n d
scode n
loopi
DESCRIPTION
These programs estimate the total redundancy of joint distributions
from a finite number of samples. n is the number of samples which
must be a power of 2 e.g. 1024, and d is the dimension of the joint
distribution. Each dimension of the input sample distribution
should be a seprate file consisting of unformatted single precision
floats (32 bits per number, no record markers). Each of these raw
input files must be processed by scode which changes to equiprobable
coordinates. The results from scode are combined in interleave,
and the output of interleave is processed by polly to produce a
single number which is printed on the standard output.
loopi is an interactive shell script which demonstrates these
programs and which could be modified to apply efficiently to a
particular application.
EXAMPLE
scode 16384 ix
scode 16384 iy
scode 16384 iz
cat ix iy iz|interleave 16384 3|polly 16384 3
cat ix iz |interleave 16384 2|polly 16384 2
Here rx, ry, and rz are files of unformatted floats each of which
is at least 16384 elements long. The intermediate files ix, iy, and
iz are each files of 16384 unformatted integers. The 4th line will
produce an estimate of the redundancy R(X,Y,Z), while the fifth
line will produce an estimate of R(X,Z) = I(X;Z). This demonstrates
the possible advantages of saving the intermediate files.
BUGS
If n is not a power of 2 there's no telling what will happen. If
the input files are in the wrong format, a result will be calculated
which is meaningless. To get reliable estmates for d values larger
than 2, n must be very large (millions). Scode uses nonlocal references
and will hence pagefault to death on large files (should be rewritten
using a good sorting algorithm twice).
REFERENCE
Fraser & Swinney. Phys Rev A, 33:1134-1140, Feb. '86.
Fraser. IEEE Trans. Inf. Theory, 35(2):245-262, March 1989.
Fraser. Physica D, 34D:391, 1989.
PROGRAMMER
A. M. Fraser 2/27/86
End of polly.1
#==========================
#= Here is file loopi =
#==========================
echo loopi 1>&2
cat >loopi <<'End of loopi'
echo This is a sample calling loop for the redundancy calculating programs
echo scode, interleave, and polly. The loop is clumsy and inefficient and
echo should be replaced by something taliored to your application."\n \n"
echo What file to work on? The file must be in binary float format.
read file
echo What length? Choose a power of two e.g. 16384.
read length
echo What dimension of delay vector do you want to consider?
echo Small dimensions are easier. Try 3
read ndl
T=0
while expr $T \< $ndl >>/dev/null
do
echo what delay for component $T ?
read Td
Tb=`expr $Td \* 4 + 1`
cat $file|tail +"$Tb"c|scode $length >/tmp/redun.$T &
T=`expr $T + 1`
done
echo wait for sorting to finish
wait
T=0
while expr $T \< $ndl >>/dev/null
do
cat /tmp/redun.$T >> /tmp/redun.a
T=`expr $T + 1`
done
echo begin interleave
cat /tmp/redun.a|interleave $length $ndl >/tmp/redun.b
echo begin polly
echo The result is `cat /tmp/redun.b|polly $length $ndl` bits
rm /tmp/redun*
End of loopi
#==========================
#= Here is file scode.c =
#==========================
echo scode.c 1>&2
cat >scode.c <<'End of scode.c'
/* scode.c(length)
use: cat rosx|scode 16384 >temp
takes floats in machine rep from stdin and produces
ints in machine rep on stdout. The order of the output
is the same as the order of the input.
The program accsess arrays in random
order, so for files to big to sit in memory the program
takes a very long time to finish.
*/
#include
main (argc, argv)
int argc;
char *argv[];
{
int *in, *out, i, i1, i2, j, n,
*outlm, *inlam, *inlbm, *inla, *inlb, *outl;
float *r;
n = atoi (argv[1]);
r = (float *) malloc (n * sizeof (float));
in = (int *) malloc (n * sizeof (int));
out = (int *) malloc (n * sizeof (int));
for (i = 0; i < n; i++)
in[i] = i;
fread (r, sizeof (float), n, stdin);
/* sort indices according to the size of the input floats */
for (i1 = 1; i1 < n; i1 <<= 1)
{
i2 = i1 << 1;
/* i1 is length of sorted sub-arrays i2 is length of array to be
produced */
for (j = 0; j < n; j += i2)
{
inla = &in[j];
inlam = inlb = &in[j + i1];
inlbm = &in[j + i2];
outl = &out[j];
outlm = &out[j + i2];
while (outl < outlm)
{
if (inla >= inlam)
while (outl < outlm)
*outl++ = *inlb++;
else
if (inlb >= inlbm)
while (outl < outlm)
*outl++ = *inla++;
else
if (r[*inla] < r[*inlb])
*outl++ = *inla++;
else
*outl++ = *inlb++;
}
}
outl = in;
in = out;
out = outl;
}
/* having finished the sort, invert the array in[] */
for (i = 0; i < n; i++)
out[in[i]] = i;
/* write the result out */
fwrite (out, sizeof (int), n, stdout);
}
End of scode.c
#==========================
#= Here is file interleave.c =
#==========================
echo interleave.c 1>&2
cat >interleave.c <<'End of interleave.c'
/* interleave(n,d)
interleave combines d integer files which should have been created
by scode into a single file and rearranges the bits into a form used
by polly (see polly.c for details).
ex: cat x y z|interleave 1024 3 >vect */
#include
main (argc, argv)
int argc;
char *argv[];
{
int n, n1, logn, w;
unsigned int *in, *out;
register unsigned int t1, t2;
register int k, i, j, d;
n = atoi (argv[1]); /* n is the number of vectors. n should be
a power of 2 */
d = atoi (argv[2]); /* d is the number of coordinates in a
vector */
n1 = 0;
for (i = 1; n1 <= n; i++)
n1 = 1 << i;
logn = i - 2;
n1 = 1 << logn; /* n1 is the largest power of 2 <= n */
w = (logn * d - 1) / 32 + 1;
in = (unsigned int *) malloc (d * n * sizeof (unsigned int));
for (i = 0; i < d; i++) /* read in the components */
for (j = 0; j < n; j++)
in[j * d + i] = getw (stdin);
out = (unsigned int *) malloc (w * sizeof (unsigned int));
for (i = 0; i < n1; i++) /* for every vector */
{
for (j = 0; j < logn; j++)
{ /* for every bit position of the input */
t2 = 0;
for (k = 0; k < d; k++)
{ /* for every component */
t1 = in[i * d + k];
in[i * d + k] = t1 >> 1;
t2 <<= 1;
t1 &= 1; /* get an input bit */
t2 |= t1;
}
for (k = w-1; k > 0; k--) /* paste d bits on the output */
{ /* Thanks to Willem van de Water */
t1 = out[k-1] >> (32 -d);
out[k] = (out[k] << d) | t1;
}
out[0] = (out[0] << d) | t2;
}
(void) fwrite ((char *)out, sizeof (int), w, stdout);
/* write the output */
}
}
End of interleave.c
#==========================
#= Here is file polly.c =
#==========================
echo polly.c 1>&2
cat >polly.c <<'End of polly.c'
/* polly(n,d)
calculates the total redundancy (generalized mutual
information) of a sample of n d dimensional vectors.
Ex: cat vect|polly 1024 3
where vect is a file of interleaved coded vectors
n- #of vectors (power of 2)
d- #of degrees of freedom of vectors
*/
#include
#include
int mask, d, ttd, flag;
double ci, logttd;
main (argc, argv)
int argc;
char *argv[];
{
int n, n1, i, logn, w, l, *in;
double log (), evlbx (), rks (), inf, rn;
n = atoi (argv[1]);
d = atoi (argv[2]);
n1 = 1;
for (i = 1; n1 <= n; i++)
n1 = 1 << i;
logn = i - 2; /* logn is the integer part of base 2 log
of n */
n1 = 1 << logn;
ttd = 1 << d; /* ttd is two to the d */
mask = ttd - 1; /* d least significant bits of mask are
set */
ci = rks (ttd - 1);
rn = (double) ttd;
rn = (rn - 1) / rn;
ci = rn * rn * ci; /* ci is confidence interval for chi
square test */
logttd = d * log (2.0);
w = (logn * d - 1) / 32 + 1;/* w is the number of 32 bit words needed
to represent each event */
l = n * w; /* l is # of words to store all the events
*/
in = (int *) malloc (l * sizeof (int));
if (in == NULL)
{
fprintf (stderr, "malloc fail in main\n");
exit (0);
}
fread (in, sizeof (int), l, stdin);
flag = 1; /* force subdivision at first level of
evlbx */
inf = evlbx (in, n1, logn * d);/* all the work is done here */
rn = n1;
inf = (inf / rn - log (rn));
inf = inf / log (2.0);
printf (" %5.2f ", inf);
}
double evlbx (ix, n, b)
int n, b;
unsigned int *ix;
/*
USE: Redundancy = (evlbx()/n -log(n))/(d-1)
ARGUMENTS:
ix[] The codes of the events in the element. Bits are
preinterleaved, and their order is inverted.
starting with the least significant bit position
the representation of an event in a 3d
calculation is...
bit#
ix[0] 0 Most significant bit of component #0
1 Most significant bit of component #1
2 Most significant bit of component #2
3 2nd most significant bit of comp #0
4 2nd most sig bit of comp #1
:
:
30 10th most sig bit of comp #0
31 10th most sig bit of comp #1
ix[1] 0 10th most sig bit of comp #2
1 11th most sig bit of comp #0
etc.
n # of events in the element.
b # of significant bits left in each event of ix
EXTERNALS:
mask A mask with only d bits set, for stripping
off one level of ix
d The demension of the vectors being worked on.
ttd 2**d
ci Confidence Interval. The 20% level for a chi-square
test of ttd-1 degrees of freedom.
logttd log(ttd) or d*log(2)
flag force subdivision on first level of recursion.
*/
{
extern int mask, d, ttd, flag;
extern double ci, logttd;
int w, bnew, i, j, k, l, wnew, *nd;
unsigned int **dix;
double f, s, t, inf, log (), rks ();
w = (b - 1) / 32 + 1; /* #of words needed to hold b bits */
bnew = b - d; /* # of bits left after d used at this
level */
wnew = (bnew - 1) / 32 + 1; /* # of words needed at next level */
nd = (int *) malloc (ttd * sizeof (int));
/* allocate array for event count of daughter elements */
for (j = 0; j < ttd; j++) /* Now count events in daughter elts */
nd[j] = 0;
for (i = 0; i < n; i++)
nd[ix[i * w] & mask]++;
f = (double) n / (double) ttd;/* do test for sub-structure */
s = 0.0; /* details at end of this file */
for (i = 0; i < ttd; i++)
{
t = (double) nd[i] - f;
s += t * t;
}
s = s / n;
if ((s > ci) || (flag == 1))
{ /* If substructure found then subdivide
the current partition element */
flag = 0; /* don't force subdivision after the first
level */
dix = (unsigned int **) malloc (ttd * sizeof (int *));
for (j = 0; j < ttd; j++)
{
if (nd[j] > 1) /* allocate arrays for events in daughter
elts */
dix[j] = (unsigned int *) malloc (nd[j] * wnew * sizeof (int));
else
dix[j] = NULL;
nd[j] = 0;
}
/* divide events in ix into appropriate daughter elts and discard the
d bits that have been used up */
for (i = 0; i < n; i++)
{
k = ix[i * w] & mask;
if (dix[k] != NULL)
{
for (l = 0; l < wnew; l++)
{
dix[k][nd[k] * wnew + l] = ix[i * w + l] >> d;
dix[k][nd[k] * wnew + l] |= ix[i * w + l + 1] << (32 - d);
}
}
nd[k]++;
}
free (ix);
/* Invoke recursion formula 20b from Fraser & Swinney Phys Rev A Feb.
86 */
inf = (double) n * logttd;
for (j = 0; j < ttd; j++)
{
if (nd[j] > 1)
{
inf += evlbx (dix[j], nd[j], bnew);
}
}
free (dix);
}
else /* when no substructure was found */
{
free (ix);
s = (double) n;
inf = s * log (s); /* PRA 2/86 Eqn# 20a */
}
free (nd);
return inf;
}
double rks (n)
int n;
{
double r, rn, sqrt ();
/* returns the 20% confidence interval of the chi-square test for n
degrees of freedom
*/
if (n == 3)
return 1.547;
if (n == 7)
return 1.400;
if (n == 15)
return 1.287;
if (n == 31)
return 1.205;
rn = n;
r = 1.0 + 1.1314 / sqrt (rn) -.24 / rn;
return r;
}
/* chi square test.
m- # of boxes
ni- # of elements in the ith box
N- sum of ni
var- variance of the assumed parent distribution
rks- reduced chi-square statistic
ssq- sample variance
Hypothesis A; The parent distribution is a flat multinomial
(see Drake page 275).
The chi-square test consists of rejecting A if the probability of
getting a {ni} more nonuniform than the one measured is less than .2.
(see Bevington pg 188 & 315)
rks = ssq/var 188 Bevington
ssq = (sum (ni - E(ni))**2)/(m - 1) everyone knows
var = N(1/m)(1 - 1/m) = N*(m - 1)/m**2 Drake pg 275
rks = (sum(ni - N/m)**2) /( ((m-1)/m)**2 *N)
The 20% points of rks are in Bevington pg 315.
Also see Knuth "Seminumerical Algorithms" page 39
*/
End of polly.c
#==========================
#= Here is file loop =
#==========================
echo loop 1>&2
cat >loop <<'End of loop'
#loop file length Ti Ts Tf di ds df outfile
# $1 $2 $3 $4 $5 $6 $7 $8 $9
echo "loop (file = $1 length = $2 Time = $3 $4 $5 dim = $6 $7 $8 out = $9" >$9.out
echo " \c" >>$9.out
rm /tmp/af$9*
T=$3
while expr $T \< $5 >>/dev/null
do
echo " $T\c" >>$9.out
d=2
while expr $d \< $8 >>/dev/null
do
expr \( $d - 1 \) \* $T >>/tmp/af$9t
d=`expr $d + 1`
done
T=`expr $T + $4`
done
cat /tmp/af$9t |sort -u >/tmp/af$9ts
rm /tmp/af$9t
for TT in `cat /tmp/af$9ts`
do
Tb=`expr $TT \* 4 + 1`
cat $1|tail +"$Tb"c|scode $2 >/tmp/af$9$TT
done
#wait
echo "\ndim\\delay" >>$9.out
cat $1|scode $2 >/tmp/af$9base
d=$6
while expr $d \< $8 >>/dev/null
do
echo "\n $d \c" >>$9.out
T=$3
while expr $T \< $5 >>/dev/null
do
cat /tmp/af$9base >/tmp/af$9ct
i=1
while expr $i \< $d >>/dev/null
do
TT=`expr $i \* $T`
cat /tmp/af$9$TT >>/tmp/af$9ct
i=`expr $i + 1`
done
cat /tmp/af$9ct|interleave $2 $d|polly $2 $d >>$9.out
T=`expr $T + $4`
done
d=`expr $d + $7`
done
for TT in `cat /tmp/af$9ts`
do
rm /tmp/af$9$TT
done
rm /tmp/af$9*
End of loop