Google
 

Tuesday, November 27, 2007

Prime number compression

Earlier today I was musing about how the product of two prime numbers can have less bits than the two prime numbers themselves and came up with a prime number compression technique, where you multiply prime numbers associated with the data you are compressing, and to decompress you'd factor to get back the primes.

Now there can seem to be multiple problems with the idea so I'll step through what I think they are as well as talk about how it works with more detail.

For example 2 and 3 each have two bits, as 2 is 10 in binary and 3 is 11, while 6 has 3 bits as it is 110, so you save 1 bit by multiplying 2 and 3 together. BUT if say 2 were associated with 'h' and 3 were associated with 'i', to be sure you had 'hi' when you decompressed you'd need one more bit to give the order, so it's a wash.

Try 3 and 7, and you have 11 and 111, while 21 is 10101, and it's even worse.

Hmmm...maybe this idea isn't so great.

But how about 5 and 17? Now you have 101 and 10001, where 5*17 = 85, which is 1010101 in binary, and you save one bit again.

Now multiply by 2, and you have 170, which is 10101010 and you've saved 2 bits.

Not a lot and it is compression where I notice a rule that 2 and primes near powers of 2, like 5, near 4 and 17 near 16, give me that one bit, but now I still have the ordering problem.

The ordering data tells what order the prime numbers are in, like if I have 2 matched with 'c', and 5 matched with 'a' and 17 matched with 't', then after factoring to see what the primes are, the ordering bits would tell you that you have 'cat'.

Given 3 primes there are 6 combinations as you go 3!, so it's even worse with 4 primes as then you have 4! = 24.

Hmmm...maybe the problem is in trying to associate each possible letter with some prime, as instead why not consider the data stream itself directly?

Presume the data is already compressed so this technique would be an add-on, and let's match primes by bitlength so look to replace 2 and 5 or 17 directly in the data stream. Consider then starting with 10, as that would occur 50% of the time with any two bits, and look ahead now for 5, which would occur roughly 1/5 of the time in the next sequence, so 10% probability. For 2 followed by 17, you have about a 0.588% probability. Or about an 10.58% probability that you get one of those.

But you have to bracket the compressed pieces so we need at least 3 in a row, but since the idea now is to take the primes from least to greatest we can have more than one at a time, like 2 in a row, so the question is, what is the probability of having 2, followed by 2, 5 or 17 followed by 2, 5 or 17?

I did something quickly and found about 13% probability, but could be wrong but I think now there is the point that you can loop through the data stream itself, multiply together when you see sequences of small primes 2, 5 or 17 in a row of at least 3 from least to greatest and put in a flag to note those areas and then decompress later by factoring in place.

And notice if you have 2 followed by 2 followed by 2, when your factoring gave you back 8, you'd know you had 101010 in the data stream itself at that point, and I think you can use primes like 3 in certain ways as long as predominantly you have primes like 2 or 5, as, for instance 2*2*3 = 12, gives a 2 bit savings.

Depending on whether or not I screwed up with my quick calculation above, and noting that the compression algorithm would take sequences as long as they went, where the longer the sequence the greater the compression--one bit at a time per prime--I'd think you could add at least another 10% compression on top of any other compression scheme.


James Harris

Tuesday, November 06, 2007

Class Viewer D?

Continuing to think about the next thing for Class Viewer I am coming now to the idea of variations on the theme, where now I'm wondering about a Class Viewer D, where "D" is for Development. The main thing new there would be that along with being able to go to javadocs when you click on a method, you'd be able to open the source code if it is available in your classpath and go directly to that method in the code in a programmer's text editor.

Thing about this idea is that it seems to me that it'd be rather easy to program though now I'd need to bundle a text editor but there are open source ones available, and if they do not allow being opened with a call to a line number, then I'd just add that feature.

That has me thinking then about plug-ins for Class Viewer where I set things up where additional features can be done by other developers as I think about a lightweight development suite around my little program. But that is more the dreaming stuff, as knowing me it's quite possible I'll still be muttering about these things a year from now.

Oh yeah, I'm still pondering the Class Viewer Enterprise variation as well, where Class Viewer is a gateway to any company's codebase, where the opening second screen which is all whitespace now would have some kind of branded welcome and maybe news where it could be on an intranet or the Internet.

I am currently looking a having some ideas really nailed down by June 2008, but may do something sooner if I could get some kind of partnership with funding going.

Without that I think the odds are increasing that Class Viewer D might move forward sometime in Fall 2008, with rapid development, so I'd have something out that fall, but with me there is no telling if I'll change my mind later or not.

Monday, November 05, 2007

Fast boot, bypassing Windows

(Click title of blog post to go to article)
Source: Wired

Press F4 to Bypass Windows With Fast-Boot Technology
By Bryan Gardiner Email 11.05.07 | 12:00 AM

There's absolutely no reason you should be waiting the three-plus minutes it takes your computer to boot up Windows, says Woody Hobbs, CEO of Phoenix Technologies. And indeed, if Hobbs has his way, you may not have to endure those waits much longer....


Yeah!!! It is so silly that as advanced as computers are today you have to wait at all for the silly, poor things to boot up. Just silly. Phoenix Technologies could save millions of lost minutes for people around the world.