How Many Times Do I Have to Shuffle This Deck?
What is surprising is that a mathematical analysis of this seemingly innocuous question is linked to a wide range of mathematical questions, and has been generalized in significant ways…
Introduction
A month ago, I was playing the card game Crazy Eights with one of my children. During the middle of one game, we went through the entire deck, so we shuffled the cards in the discard pile a few times and started playing again. Hmmm. The cards were still grouped too closely by suits; that is, in spite of the shuffles, we were still seeing long sequences of, say, clubs when drawing from the pile. The ordering from the first part of the game persisted after the shuffles; it would seem that the shuffles hadn’t affected the ordering of the deck as much as we wished.
This led me to wonder how many shuffles would be needed to give a good guarantee that the deck had been put in random order. Curiously, I soon came across an essay in Aigner and Ziegler’s wonderful book Proofs from the Book that addressed just this issue and that led me to a remarkable paper by Aldous and Diaconis that gives an answer to this question.
This column will explain a few of the techniques and results from that paper. What is surprising is that a mathematical analysis of this seemingly innocuous question is linked to a wide range of mathematical questions, in areas such as Lie algebras, Hochschild homology, and random walks on graphs, and has been generalized in significant ways.
Let’s begin by supposing that we have a deck of n cards; to keep things simple, we will assume that the cards are decorated only with the numbers1to n. Initially, the cards are in numerical order and we will shuffle them using some technique that we agree upon. The usual shuffle is called ariffle shuffle and will be discussed later. For now, we’ll begin with a simpler type of shuffle called the topinatrandom shuffle. In this shuffle, we remove the top card from the deck and place it, with equal probability, in any of the n available positions.
π  Q(π) 
1 2 3  0.333 
1 3 2  0.000 
2 1 3  0.333 
2 3 1  0.333 
3 1 2  0.000 
3 2 1  0.000 
This shows that three of the six possible orderings are equally likely, and the other three are not at all possible. So we shuffle again … and again. The possible orderings of the deck are shown below after 1, 2, and 3 shuffles. For convenience, the cards are here represented by colors rather than numbered.
Q1(π)  Q2(π)  Q3(π)  Q4(π)  Q5(π)  Q6(π)  Q7(π)  Q8(π)  Q9(π)  Q10(π)  
1 2 3  0.333  0.222  0.185  0.173  0.169  0.167  0.167  0.167  0.167  0.167 
1 3 2  0.000  0.111  0.148  0.160  0.165  0.166  0.166  0.167  0.167  0.167 
2 1 3  0.333  0.222  0.185  0.173  0.169  0.167  0.167  0.167  0.167  0.167 
2 3 1  0.333  0.222  0.185  0.173  0.169  0.167  0.167  0.167  0.167  0.167 
3 1 2  0.000  0.111  0.148  0.160  0.165  0.166  0.166  0.167  0.167  0.167 
3 2 1  0.000  0.111  0.148  0.160  0.165  0.166  0.166  0.167  0.167  0.167 
Notice that as we perform more shuffles, the frequency with which we see the different orderings becomes more uniform. By U(π), we will denote the probability density in which each ordering π is equally likely; that is, U(π)=1/n! The results in the table above indicate that
To make this statement precise, and to faciliate a quantative analysis of shuffling, we need to introduce a notion of distance between probability densities. Therefore, suppose that Q1 and Q2 are two probability densities on Sn. First, we define the difference in Q1 and Q2 for A, a subset of Sn, by adding together the differences over all the elements of A:
Then the distance between Q1 and Q2 is the maximum distance over all the subsets of Sn:
Note that this is a very sensitive measure of the distance between the two densities. First, notice that 0≤∥Q1−Q2∥≤1. Next, imagine that U, the uniform distribution on Sn, and Q is a density resulting from a particular type of shuffle. If this type of shuffle leaves a single card in its original place, then
which is a rather large distance considering that the maximum distance between sets is 1.You may wish to check that the distance between the initial distribution I, which has I(12…n)=1 and I(π)=0 for all other orderings, satisifies:
With this definition, our example indicates that ∥Qk−U∥→0 as k→∞, which means that shuffling the deck using topinatrandom shuffles will eventually produce all orderings of the deck with equal probability.
Strong uniform stopping rules
Given a technique for shuffling the deck, and its resulting probability density function Q, we would like to be able to estimate how far away from the uniform distribution we are after k shuffles; that is, we would like to understand the distance ∥Qk−U∥.
The notion of a strong uniform stopping rule gives us a tool for understanding this distance. First, imagine that we look at each shuffle in a sequence of shuffles and that we have a criterion that stops the sequence when satisfied.
For example, imagine that we have a sequence of topinrandom shuffles and that we stop the sequence when the card that was originally at the bottom of the deck (and labeled by n) rises to the top and is then inserted somewhere in the deck.
This is illustrated below for the case n=3. The bottom card is shown in red so we stop the sequences at the orderings in the gray box.
Suppose that T denotes the stopping time for a sequence of shuffles and P(T>k) denotes the probability that the stopping time of a sequence is larger than k. Then we have the following bound on the distance beween Qk and U: ∥Qk−U∥≤P(T>k).

In other words, this reduces the problem of comparing the distance between the density after k shuffles and the uniform density to an analysis of the stopping time. As we’ll see, this can be a much simpler problem.Where does this bound come from? It’s fairly straightforward. Remember that the distance ∥Qk−U∥ is defined by looking at the difference in the densities on subsets A of Sn. By Xk, we will denote the ordering that results from shuffling k times.
We’ll begin by noting that we may partition the sequences of shuffles in which Xk is in A by their stopping times:
We will rewrite both of the terms on the right. Consider the first term: here’s where we use the fact that the orderings that result from a fixed stopping time are uniform:
We will rewrite the second term using the conditional probability:
Combining these two expressions gives
which is the same as saying
Now since P(Xk∈AT>k) and U(A) are both probabilities, their values are between 0 and 1. Therefore, the absolute value of their difference is as well as we obtain:
Since this holds for every set A, we obtain the desired bound:
Riffle shuffles
Armed with this tool, we may study how fast a particular shuffling process converges to the uniform distribution by finding a strong uniform stopping rule and determining the probability that the stopping time is larger than k. We will apply this tool to a study of riffle shuffles, which model the usual way in which a deck of cards is shuffled.
To perform a riffle shuffle, we choose c cards from the top of the deck according to the binomial distribution. That is, the probability that we choosec cards is
We hold these cards in our left hand and the other n−c cards in our right. A card now falls from either our left or right hand with the probability that the card falls from one hand being proportional to the number of cards in that hand.
This description models the physical act of shuffling cards. However, an alternate way to describe this shuffle is like this: choose the c cards off the top according to the binomial distribution. For instance, here we choose c=3 cards.  
Now choose, with equal probability, a set of c positions within the deck of n cards.  
Place the chosen cards, in order, in these positions, and the remaining n−c cards, in order, in the other positions. 
We will denote by R the probability density on Sn defined by one riffle shuffle. We wish to study Rk, the density that results after k riffle shuffles, and determine when it is sufficiently close to the uniform density to consider the deck well shuffled.
It turns out that it is easier to answer this question by considering what happens when we undo a riffle shuffle. To do this, simply walk through the deck labeling every card, with equal probability, with a “0” or “1.”  
Then use the labels to divide the cards into two ordered groups and place the group labeled “0” on top. 
This results in the inverse riffle shuffle:
which implies that ∥R¯¯¯¯k−U∥=∥Rk−U∥.
A stopping rule for inverse riffle shuffles
Now that we have seen that the densities defined by inverse riffle shuffles lie the same distance from the uniform density as those defined by riffle shuffles, we would like to develop a strong uniform stopping rule for inverse riffle shuffles. But first, let’s describe a means of enumerating inverse riffle shuffles.
Suppose we perform a sequence of k inverse riffle shuffles. In each shuffle, each card is labeled with a “0” or a “1.” If we concatenate the labels from each shuffle, then each card is labeled with a string of k 0’s and 1’s. (We usually call such strings kbit strings.)
To summarize, a sequence of k inverse riffle shuffles defines an ordering of the n cards as well as a set of n kbit strings. Conversely, if we choose an ordering of the cards π and a set of n kbit strings, we may define a sequence of k inverse shuffles: begin with the standard ordering of the cards and perform inverse shuffles, based on the appropriate bit:
The ordering on the deck of cards that results is π.The stopping rule is simple: stop when the cards have distinct labels. To see that this is a strong uniform stopping rule, simply notice that the choices of an ordering π and a set of n kbit strings are independent. Therefore, if B is a set of distinctkbit strings and π1 and π2 are two orderings of the deck, then (π1,B) and (π2,B) give two sequences of inverse riffle shuffles that are equally likely and that lead to π1 and π2, respectively. Therefore, we are just as likely to see π2 as π1 as a result of a sequence of inverse riffle shuffles with stopping time k.
Now we’re in great shape. If T is the stopping time of a sequence of inverse riffle shuffles, we may apply our earlier result:
We’ll compute P(T>k), the probability that the stopping time is greater than k, by working instead with P(T≤k)=1−P(T>k). Suppose we have a sequence of k inverse riffle shuffles with T≤k (if T<k, just shuffle a few more times for a total of k shuffles). As before, this leads to an ordering π and a set of n distinct kbit strings. The number of orderings π is n! and the number of n ordered distinct kbit strings is n!(2kn) since there are 2k kbit strings. This means that the number of sequences of k inverse riffle shuffles with a stopping time less than kis
Now the total number of sequences of k inverse riffle shuffles is the number of orderings π times the number of length n kbit strings, which is n!(2k)n. This gives:
This leads to the estimate
With this in hand, let’s consider shuffling a typical deck of n=52 cards using riffle shuffles. We find the following bounds on the distance of Rk and U, which we will denote by d(k):

With this as our guide, we see that d(k) is relatively constant for small k but decreases exponentially for larger k. There is consequentially a region of transition where d(k) changes from roughly constant to decaying exponentially. It seems reasonable that some value k in the middle of this transition gives an appropriate number of shuffles to guarantee that the deck is sufficiently randomized. Using this model, it appears thatk=12would be a good recommendation for the number of shuffles.
Summary
A more sophisticated study enabled Diaconis and Bayer to improve this bound: for a deck of n=52 cards, they found

Based on this analysis, Diaconis has written that “seven shuffles are necessary and suffice to approximately randomize 52 cards.” Of course, our technique has just given an upper bound for the distance between Rk and U. In fact, J. Reeds, in an unpublished manuscript, introduced a technique to find a lower bound: ∥R6−U∥≥0.1. This lends further evidence to the fact that seven shuffles should be used. Please see the Postscript to this article for a further discussion. .Of course, we could use a computer to generate random orderings of the deck, but we still need a reliable method to ask the computer for the next ordering. Bridge Clubs originally swapped 60 cards at random, but Diaconis and Shahshahani showed that this is not enough to generate sufficiently random orderings.
Knuth describes a process that proceeds in stages beginning at i=1. Choose a random number j so that i≤j≤n and interchange the cards at positionsi and j. This generates the uniform probability density with 51 swaps.
A description of the current process using by Bridge Clubs is given by the website Bridge Hands.
Does it really matter? Yes! Martin Gardner describes some card tricks that rely on the fact that three riffle shuffles is not enough to generate random orderings. For example, suppose that a deck is shuffled three times and cut in between the shuffles. If a card is taken out, recorded, and put back in the deck in a different position, then that card can be determined almost all the time. De Bruijn also describes a similar trick.
The model we have adopted for the riffle shuffle seems reasonable, but how well does it model shuffles performed by a human? It depends on the human. Studies show that professional dealers drop single cards from one hand roughly 80% of the time and pairs about 18% of the time. Less experienced shufflers drop single cards about 60% of the time. Of course, when you’re playing Crazy Eights with a kid, the cards sometimes get thrown all over the room resulting in the uniform distribution after one toss!
Mathematical Card Deal Theory and it’s Advantage
The clear casino’s advantage derives from the fact that the Dealer draws last. If you and the Dealer go over (bust), casino wins. Have you ever noticed that the Dealer deals cards with a specific way? Instead of dealing two cards (first way) to the first player, two cards to the second player, two cards to the third player etc, he deals one card to each player and himself, and starts again from the beginning (from the first player) dealing the second row of cards to the players (2nd way). Let’s say that the deck contains the following row of cards: A1, A2, A3, A4 … Aμ1, Αμ.
If the Dealer deals cards using the first way, then they shall receive the following cards:
 the first player (A1, A2)
 the second player (A3, A4)
 the third player (A5, A6)
 the ν player (A2ν1, A2ν)
The second way though, is the one that the Dealer uses, that means that they shall receive the following cards:
 the first player (A1, Aν+2)
 the second player (A2, Av+3)
 the third player (A3, Aν+4)
 the ν player (Aν, A2ν+1)
However according to the “packs phenomenon” (many large or small cards grouped together), if A=10 (a pack of cards with value of 10) then the pairs of the cards (A1, Aν+2), (A2, Av+3) …(Aν, Α2ν+1) create more sumtotals from 12 το 16 than the pairs of the cards (A1, A2), (A3, A4) …(A2ν1, Α2ν).
Therefore, you must stand on as low as 12, because the “packs phenomenon” and the specific deal of the cards will burn your hand in higher percentages, fact that offers to casino a clear advantage over you. For this reason you must stick to your cards and all the possibilities are in favour of you, because you force the Dealer to hit and go over (BUST)! The way that Dealer will gather players cards in combination with an inadequate shuffle will result the creation of the “packs’ phenomenon” ex.
*1st player’s cards 7, 6, 2, 7
* 2nd player’s cards 5, 4, 4, 2, 8
* 3rd player’s cards 2, 5, 8, 6
* 4th player’s cards 6, 8, 2, 7
* 5th player’s cards 5, A, 10, K.
Pay attention to the flow of the cards inside box (Shoe) after Dealer gathers it:
7,6,3,7,5,4,4,2,8,6,8,2,7,5,Α,10,Κ,2,5,8,6.
From 7 to A there are fourteen babes (small cards) in the flow! This is a pack of small cards (babes pack). Even in the condition that the players have good hands that no need improvement over Dealer’s Face Card, it is created once more the “packs phenomenon” but with strong cards.
Obsessive Ideas
There is a certain detail that even the best blackjack professionals can hardly understand while it is very simple.They use the word “longterm” and they think that it is approached through the giving this way equal statistic period of appearance for every card and finding the percentage of advance or not for every. Moreover they do they consider the Law of the Big Numbers assuming as an unfortunate example an ideal coin! That means that the probability of heads and tails is of equal value!
In real life there is no such thing as a perfect coin!! Even for thousands of a gram one of the two sides will be heavier, so longterm speaking there will be a greater chance of appearance of the heavier side!! Something similar applies for Blackjack. The deck hasn’t got enough time to be properly mixed (the coin has one side heavier). As a comparison one can say that the less mixed a deck of card is the heavier the coins side is. And the same game goes on with a deck that isnt mixed. When the deck reaches the point where it is mixed then the Casino stops the game (shuts the table) and the table reopens when the decks are set again to their original position.
Therefore one can be playing longterm by using the shortterm mixing of the deck. He can never be able to achieve statistically equal period of appearance for every card!! Simply the Law of the Great Numbers will make its appearance making the unequal statistically period of i.e. how many times the Dealer’s “6” binded with a queen and a “5” and made a “21” more notable. But this statistically period appears in greater percentages than it should.
The wrong base of all the Theories that exist in the market is resbonsible for the Economic Disaster of every player from the simplest player to the professional Card Counter. Casino of great reputation in Greece was using 6 decks for the Blackjack while since January of 2006 uses 4 decks of cards!! As a matter of fact at their site this Casino even now mentions that at their game they use 6 decks! Did they forget to correct this? You think that this sudden reduction of the decks (less mixing is required for acquiring the good assortment) that appeared at the same period of time that our site appeared is coincidental?
World BlackJack Originalities by BlackjackASSUS
How many times have you played Basic Strategy and lost all your capital rapidly?
How many times have you won and how many times have you lost while playing precisely the Basic Strategy?
Basic Strategy plights that if you precily apply it during your game,you will lose in longterm only 1% of your money.But how much is longterm?Most likely when t extends to infinity (t>oo)…
And how many years live an average person?
Eventually we could assume that the Basic Strategy was established to be using by Aliens who live eternal…
It is said that probably the condition longterm of Basic Strategy means a definite time as it is proved by computers.But computers can shuffle the decks infinite times while casinos can’t do that.
A few years ago, Harvard University studied the shuffling of cards
* (primary assortmentand is:
Α,2,3,4,5,6,7,8,9,10,J,Q,K,Α,2,3,4,5,6,7,8,9,10,J,Q,K,Α,2,3,4,5,6,7,8,9,10,J,Q,K,Α,2,3,4,5,6,7,8,9,10,J,Q,K,Α,2,3,4,5,6,7,8,9,10,J,Q,K,Α,2,3,4,5,6,7,8,9,10,J,Q,K )
and concluded that one pack of 52 sorted cards needed 7 shuffles (constant). So, if a casino uses a shuffling machine with 6 packs of cards to achieve a good assortment, the packs must be shuffled 7χ7χ7χ7χ7χ7=117.649 times.
It’s practically unobtainable in good time, because it takes days for this slowing down games at casinos and limits their gain.
So casinos can’t avoid the appearance of the ”pack phainomenon”!
The Basic Strategy necessitates infinite shuffles of the decks (shuffle machines can’t do it practically) to achieve chance setting of the cards,and infinite time to apply the Law Of Large Numbers.Casinos rely in the above condition playing 24 hours a day with a inexhaustible capital! That’s why Casinos win constantly,because Basic Strategy is the certain way for a player to lose all his money rapidly!!
The question is how we could turn to advantage the “packs phainomenon”…How can we manipulate these capricious distributions of the decks?If we only could (in a way) “make” these distributions and thereafter seek for the appropriate strategy that will turn to account the prospects of our cards and dealer’s card. That is exactly what our program do!!It “makes” any distribution of the decks and demonstrates the odds of our success in a definite time:
• You don’t have to remember numbers
• You don’t have to practice constantly, it’s
ֺ easy in it’s use!
• You don’t need to strain effort ֺ just clever way!
• It shows precily the array of the cards that
ֺ appear!
• It simulates any Casino without regard to ֺthe number of decks that it uses
• It’s versatile in any financial ֺ strategy of the player!
• It rejects Basic Strategy!
• It’s based in a stunning random algorithm
ֺ that generates numbers!
• The first Program worldwide that ֺ simulates the Black Box (shuffles ֺthe cards with replacement)ֺand the “Shoe” (withought ֺreplacement)!
• It has accuracy of many digital numbersֺ in it’s results with percentage %!
• It draws up to 1,000,000 times and
ֺ uses 130 decks in a few seconds!
• It has simple apearance and use!
• It has small size (2 MB), but fast in it’s results!
The BlackjackASSUS occupies exclusively with the discovery of new theories and concepts in the Theory Of Numbers,which is the capstone of Maths(as it’s called by the scientific community) and by extension Blackjack,which the only technical and not lucky game in the Casinos.
Really, Blackjack conceals so much science in it’s depth that verges art…
MATHEMATICAL DEVELOPMENTS FROM THE ANALYSIS OF RIFFLE SHUFFLING
 Introduction The most common method of mixing cards is the ordinary riffle shuffle, in which a deck of n cards (often n = 52) is cut into two parts and the parts are riffled together. A sharp mathematical analysis for a natural model of riffle shuffling was carried out by Bayer and Diaconis (1992). This gives closed form expressions for the chance of any permutation and allows analytic approximation and exact numerical evaluation to show things like “seven shuffles are necessary and suffice to approximately randomize 52 cards”. These results are carefully stated in Section 2A.
The shuffling work builds on earlier studies of Jordan (magic tricks), Borel (bridge), Gilbert, Shannon, Reeds (basic model) and D. Aldous (coupling). This background is described in Section 2B. The “seven shuffles” result is mildly dependent on the choice of metric and a number of alternative measures of randomness are discussed in Section 2C.
There is a mathematical reason that allows riffle shuffles to be analyzed so completely. The basic shuffling model falls squarely into Solomon’s descent algebra (and indeed gives an independent development). This allows shuffling theorems to be translated into permutation enumeration results (e.g. how many permutations have a given number of descents and a given cycle structure). The eigenvalues of the Markov chain underlying shuffling were actually first determined in an investigation of Hochschild homology(Hanlon). There is an intimate connection with free Lie algebras and the PoincareBirkoffWitt Theorem (BergeronBergeronGarsia). The chance of a given cycle structure after riffle shuffling equals the chance that a random degree n polynomial has a given number of irreducible factors. This in turn is explained by considering the connection between shuffling and the action of the associated Lie type group SLn(Fq) on its Lie algebra (Fulman). Finally, shuffling gives a fairly direct interpretation of Schur symmetric functions (StanleyFulman). These results are described in section three.
The analyses above seem so rich and natural that they call out for generalization. A sweeping generalization of the theory was discovered by Bidigare, Hanlon and Rockmore. This involves random walk on the chambers of a hyperplane arrangement. The classical braid arrangement gives riffle shuffles but there are many other hyperplane arrangements where the chambers can be labeled by natural combinatorial objects and much (but not all) of the theory goes through. In an amazing synthesis, Ken Brown has shown that almost everything can be pushed through to random walks on idempotent semigroups. This allows analysis of natural random walk on the chambers of spherical buildings. These results are described in Section 4.
The final section describes ten open problems.
Throughout I have tried to show the links with algebra. To be fair, many of the authors cited have no interest in the card shuffling implication of their work. This paper has improved from detailed comments of Ken Brown, Nantel Bergeron, Jason Fulman, Adriano Garsia and J.C. UyemuraReyes.
2A. Basic Shuffling.
The basic riffle shuffling model was introduced by Gilbert and Shannon (See Gilbert (1955)) and independently by Reeds (1981). It can be described as a probability distribution Q(w) on the symmetric group Sn – the GSR Distribution. One description of Q is as follows: cut the deck into two piles according to the binomial distribution so the chance that pile one has j cards is ( n j )/2 n. Then, sequentially drop cards from the bottoms of the two piles according to the following rule: if at some stage pile one has “A” cards and pile two has “B” cards, drop the next card from pile one with probability A/(A + B). This is continued until the two piles are exhausted and then the piles are pushed together. An equivalent description in terms of inverse riffle shuffles is due to Gilbert, Shannon and Reeds. An inverse shuffle begins by labeling each of n cards zero or one by a flip of a fair coin. Then, all the cards labeled zero are removed and placed on top keeping the cards in the same relative order. It is a simple exercise to show that the forward and backward descriptions are the same. From either description, given the cut, all ways of interleaving are equally likely, so the GSR shuffle is a maximum entropy model. The identity has probability n+1 2n while all other possible permutations have probability 1/2 n.
Repeated shuffles are modeled by convolution:
Q ∗ Q(w) = X u Q(wu−1 )Q(u)
Thus the chance of w after two shuffles is calculated as the chance of first choosing u and then choosing the permutation resulting in w. Similarly, Q∗k (w) = ΣuQ∗(k−1)(wu−1 )Q(u). These ingredients complete the description of the GSR measure Q∗k (w). Of course, shuffling is an example of random walk on a group and of a finite statespace Markov chain. See SaloffCoste (2001, 2002), for an extensive overview with relevance to shuffling.
Repeated shuffling converges to the uniform distribution U(w) = 1/n!. The earliest works on Markov chains, due to Markov (1906) and Poincare (1912), used shuffling cards as an example. They gave results which allow us to conclude that
Q ∗k (w) → U(w) as k → ∞.
It is natural to try to quantify this statement. The usual distance to stationarity is the total variation distance
kQ ∗k − Uk = max A⊂sn Q ∗k (A) − U(A) = 1 2 X w Q ∗k (w) − U(w).
Consider the middle term. Its interpretation is this: let A be any subset of Sn (e.g. the set of all permutations where the ace of spades is in the top half). Calculate the chance of A after k shuffles (that is Q∗k (A)). Calculate the uniform measure of A (namely A/n!). Take the difference between these numbers and then take the A which makes this difference as large as possible. This is a very nonforgiving distance. If kQ∗k − Uk ≤ then shuffling is close to uniform for any set A.
These definitions translate the basic question “how many times should a deck of cards be shuffled to adequately mix it?” into a well posed math problem: given > 0 how large should k be to have kQ∗k −Uk < ? A historical review of progress on this problem is contained in the following section. Here we state the basic result.
Theorem 1. When n = 52, the distance to uniformity is
k: 1 2 3 4 5 6 7 8 9 10 kQ∗k − Uk 1.000 1.000 1.000 1.000 .924 .614 .334 .167 .085 .043
For general n, and k = 3 2 log2 n + c kQ ∗k − Uk = 1 − 2Φ −2 −c 4 √ 3 ! + 0 1 n1/4 ! with Φ(x) 1 √ 2π Z x ∞ e −t 2/2 dt.
Remarks
The total variation distance is a number between zero and one. A graph of Q∗k vs k shows it stays close to its maximum until a bit before 3 2 log2 n and then falls exponentially fast to zero. The analysis shows that for k = 3 2 log2 n + c the distance goes to one doubly exponentially fast as c → −∞ and to zero exponentially fast as c → ∞. These asymptotic results are borne out by the data for n = 52. The cutoff occurs at about seven shuffles. From six shuffles on, the variation distance falls by a factor of 2 for each extra shuffle. It is worth noting that the table is derived from exact results from Theorem 2 below not from an asymptotic approximation.
It is natural to wonder if this mathematics has much to do with real shuffles. People used to think that cards were suitably well mixed after three, four or five shuffles. Like many things people believe, this is simply not true. In Section 2B a classical card trick and some extensive analysis of bridge hands are used to prove this point.
Theorem 1 is a consequence of a more central result. To explain it, it is useful to have a geometric description of riffle shuffles. Picture n points dropped uniformly and independently into the unit interval. Label the ordered points, left to right, as x1, x2, . . . , xn. Now perform the baker’s transform of [0, 1] to itself. This takes x → 2x(mod 1). The points xi are permuted inducing a permutation w. Note that there are a binomial number of the xi in [0, 1/2]. The baker’s map stretches these out to [0, 1]. The same holds for the points in [ 1 2 , 1]. These two sets of points are interrleaved. It is not hard to see that the induced permutation has exactly the GSR distribution Q(w).
This geometric description suggests a variant which will prove useful. For positive integer a, consider the ashuffle which results from n random points under the map x → ax (mod 1). In shuffling language one may cut the deck into apackets according to the multinomial distribution n n1 . . . na /an with 0 ≤ ni ≤ n, Σ a k=1ni = n. The a packets are sequentially mixed, dropping a card from the ith packet with probability proportional to packet size. These equivalent definitions result in a probability Qa(w). In present notation Q2(w) = Q(w). From the geometric de scription, it is easy to see that an ashuffle followed by a bshuffle is the same as an ab shuffle thus Qa ∗Qb = Qab and Q∗k 2 = Q2k. It is enough to study only ashuffles. The main result of BayerDiaconis can now be stated.
Theorem 2
For all positive integers n and a, Qa(w) = n + a − r n an with r = r(w) the number of rising sequences in w.
To explain, consider a permutation w as an arrangement of a deck of cards, with wi the label of the card at position i. Decompose w into disjoint rising sequences by finding card labeled 1, and then card labeled 2 if label 2 is below label 1. Continue until label k stopping if label k + 1 is above one of {1, 2, . . . , k}. Remove cards labeled {1, 2, . . . , k}. This is the first rising sequence. Continue with the reduced deck, finding {k + 1, . . . , k +`} a second rising sequence and so on. Thus, for n = 9, the permutation 716248359 has rising sequences 123, 45, 6, 789. Let r = r(w) be the number of rising sequences obtained. Thus 1 ≤ r ≤ n. Another description: r(w) = d(w −1 ) + 1 with d(w −1 ) the number of descents in w −1 . Descents will make a major appearance in Section 2C.
We will not give a proof of Theorem 2 here (see BayerDiaconis (1992) or the clear elementary treatment of Mann (1995)). It is straightforward from the geometric description using the “stars and bars” argument of elementary combinatorics. The hard part was discovering the result. We did this by looking at exact computer calculations for small decks (size 3, 4, 5) and noticing a pattern.
Theorem 2 gives yet another description of the GSR measure
Q(w) = (n + 1)/2 n If w = id , 1/2 n If w has 2 rising sequences, 0 otherwise.
Theorem 2 reduces the calculation of total variation to evaluating
Q ∗k − u = 1/2 Σ(n, j=1) kn(j) (n + 2k − j) ∗n/ 2 kn − 1/ n!
with kn(j) the number of permutations with j rising sequences. At this point another surprise occurred. The kn(j) are very well studied as the coefficients of Eulerian polynomials (Stanley, 1997). This allowed careful asymptotic analysis and led to Theorem 1.
This concludes our overview of the basic shuffling story. We turn to a bit of history and then some more mathematical consequences.
2B. History and Practical Consequences.
The earliest treatments of Markov chains treat card shuffling as a leading example (Markov (1906), Poincare (1912), Doob (1954)). These treatments show that shuffling cards eventually results in a well mixed deck. It is very hard to guess how many shuffles are needed. When n = 52, n! ˙= 8 × 1067. For the other popular method of shuffling (overhand) Pemantle (1984) shows order n 2 log n shuffles suffice. This is more than 2500 when n = 52.
Emil Borel in Borel and Cheron (1940) began a quantitative investigation by studying how long individual cards and pairs of cards take to randomize. This allowed him to conclude that at least six or seven shuffles are needed. Similar conclusions are drawn by Keller (1995).
Independently, magicians had discovered that rising sequences allowed good card tricks to be performed if cards are not well shuffled. Details and references appear in BayerDiaconis (1992). Bridge players went from hand shuffling to computer generated shuffling in their tournaments. A comparison of before and after suit distribution shows that the standard four or five riffle shuffles followed by a cut are grossly inadequate Berger (1973). Thorpe (1972) is an early survey detailing ways of taking advantage of poor shuffling in casino games.
Modern work on the mathematics of riffle shuffling begins with work of Gilbert (1955). He reports joint work with Shannon on the GSR model. They proved some combinatorial properties of GSR shuffles and suggested log2 n would be enough. The model was independently discovered by Reeds (1981) who made extensive computer studies. Aldous (1983) gave a coupling argument which proves that 3 2 log2 n shuffles are sufficient for n large. Aldous and Diaconis (1986) carefully prove that
kQ ∗k − Uk ≤ 1 − nY−1 i=1 1 − i 2 k .
This bound becomes less than 1 2 for k = 11 when n = 52.
An empirical study of the GSR model compared to actual shuffles appears in Diaconis (1988). This concludes that the model is a good fit. Of course, much depends on the shuffler – casino dealers (along with the present author) can shuffle close to perfectly and eight perfect shuffles recycle the deck! See DiaconisGrahamKantor (1983) or Morris (1990) for more of this. There is much further work to do in developing tractable models with a few parameters which allow individual tuning. Because of its maximum entropy property the GSR model offers a provable lower bound to any less uniform distribution.
As a final practical note, DiaconisHolmes (2000) analyze a class of mechanical ’shelfshufflers’ used in casino games. In these, a deck of n cards is distributed randomly onto a shelves. At each stage, cards are placed at random above or below previously placed cards on a shelf. At the end, the packets are output in random order (it turns out not to matter). The shuffle is not repeated. It turned out that the theory developed for type B (hyperoctahedral group) gave a complete analysis.
2C. Other Measures of Randomness The results of Theorem 2 allow computation in various alternatives to the total variation metric. Aldous and Diaconis (1983) derive results for separation distance
s(k) = max w 1 − Q∗k (w) U(w) = 1 − n! 2 k n 2 nk = 1 − nY−1 i=1 1 − i 2 k .
As discussed above this needs k = 11 to make it small when n = 52. Su (1995), TrefethanTrefethan (2002) and Stark et al. (2000) derive results for entropy distance that suggest k = 5 or 6 shuffles suffice when n = 52. The theorem of Stark et al. (2000) shows that the entropy distance decreases by a constant factor up to log2 n shuffles when it goes to zero exponentially. A graph of the distance versus entropy for small values of n seems to show a discontinuous derivative at log2 (n). If true, this would be a new kind of phase transition. Lovasz and Winkler (1995) use Theorem 2 to show that a very different distance, the expectation of the fastest strong stationary time will be small after k = 11.
All of the above are global measures of uniformity. In explaining the convergence results to a popular audience, the following notion seemed useful. Consider playing the following game. A deck of cards is on the table. Guess at the top card. This card is then shown and discarded. Then guess at the next card (which is then shown and discarded) and so on. If the deck is perfectly mixed, the chance that the first guess is correct is 1/n, the chance the second guess is correct is 1/(n−1), etc. Thus 1/n + 1/(n − 1) + . . . + 1 correct guesses are expected. When n = 52 this is about 4.5. Suppose instead that k riffle shuffles have been carried out. A conjectured optimal strategy for guessing was derived by McGrath (see BayerDiaconis (1992)). Using the strategy yields about 5.01 correct guesses after seven shuffles with 4.97 correct following seven shuffles and a cut. In related work, Ciucu (1998) studies the optimal guessing strategy following kriffle shuffles when no feedback is given. He proves that for 2n cards if k ≥ 2 log2 (2n) + 1, the best strategy is to guess card 1 for the first half of the deck and card 2n for the second half. For k < 2 log(2n), there are better strategies. In particular, after one shuffle he shows that guessing 1, 2, 2, 3, 3, 4, 4, . . . in order gives p 8n/π correct guesses asymptotically. His analysis rests on an explicit diagonalization of the Markov chain which tracks the position of the card labeled 1. This is closely related to work in Section 4B below. The above study suggested looking at classical permutation enumeration questions (e.g. number of fixed points or cycles) after an ashuffle. This turned out to be surprisingly neat. For example, the expected number of fixed points is Ea(F p) = 1 + 1 a + 1 a 2 + . . . + 1 a n−1 . For cycles, the full story was derived by DiaconisMcGrathPitman (1995). Let Qa(n1, n2, . . . nn) be the chance that an ashuffle results in a permutation with ni icycles. They proved (2.1) Qa(n1 . . . nn) = 1 a n Yn i=1 ni + fi(a) ni with fi(a) = 1 i X di µ(d)a i/d The proof uses a remarkable bijection of Gessel and indeed gives a selfcontained proof of Gessel’s results – see GesselReutenauer (1993) for Gessel’s version with extensive application to enumerating permutations by descents and cycle structure. The formula 2.1 and some analysis show that features of a permutation that only depend on cycle structure become random before 3 2 log2 nshuffles; the length of the longest cycle is close to its uniform distribution after one shuffle. In a different direction, discussed further in Section 3B, Fulman (2002) has shown that the length of the longest increasing subsequence has its correct limiting distribution after 5 6 log2 n shuffles. These results also imply that the patience sorting solitaire game described by AldousDiaconis (1995) will then behave as if the deck was random. UyemuraReyes (2002) has studied the number of riffle shuffles required to randomize just a few cards e.g. the original top card. He derives bounds using coupling and remarkable formulas for how the eigenvalues of the GSR shuffles split by representations. His results generalize earlier profound work of Bergeron, Bergeron, Garsia (1989), and Hanlon (1990). They are discussed further in 4B below.
All of this shows that “seven shuffles suffice” is just a rough guide. From Theorem 1, it is where the cutoff happens. To finish off this part of the shuffling story we note that the analysis has been broadened to show that the age old custom of following shuffling by a random cut does not help appreciably in convergence. This is illustrated in BayerDiaconis (1992) and much more sharply in Fulman (2000B). This last paper connects shuffling with cuts to cyclic descent theory.
 Some Mathematical Connections A. Descent Theory A permutation w has a descent at i if wi+1 < wi . The set of all such i makes up the descent set D(w) ⊆ {1, 2, . . . , n − 1} = [n − 1]. Descents record the up down pattern in permutations and are a natural object of combinatorial study. Stanley [1972, 1986] lays out the classical theory and Buhler et al. (1994) make a fascinating connection to the mathematics of juggling. Stadler (1997) develops links between descents, shuffling and juggling for permutations of multisets. Let S ⊆ [n − 1] and let aS = X w:D(w)=S w. Louis Solomon (1976) observed that as elements of the group algebra Q[Sn], the aS are the basis for a subalgebra now called Solomon’s descent algebra. In particular aSaT = Σuc u ST au for c u ST ∈ Z. Solomon’s motivation was to give a group theoretic interpretation of Mackey’s induction theorem. He did this in a unified way for classical Weyl groups. The development he started now has a life of its own. The connection to shuffling cards comes through the following observations. The set of permutations with a single descent at position i (along with the identity) are exactly the permutations realized by removing an i element subset of 1, 2, . . . , n and placing them to the left (keeping all else in its same relative order). This is exactly the inverse riffle shuffles consonant with i cards cut. Summing in i, let A1 = Σn−1 i=1 ai this is the sum of all permutations with a single descent. Excepting the identity, it is also the result of an arbitrary inverse riffle. If Q is the GilbertShannonReeds measure then, as an element in Q[Sn], X w Q(w −1 )w = n + 1 2 n id + 1 2 n A1. Thus the neat convolution properties of the GSR measure show that if Ai is the sum of permutations with exactly i descents (so A0 = id), then A0, A1, . . . , An−1 are a basis for a commutative subalgebra of the descent algebra. In particular, AiAj = AjAi = Σc k ijAk. This commutative subalgebra of the descent algebra appears in BayerDiaconis (1992). As explained there, close relatives had been discovered by GerstenhaberSchack [1987] in their development of Hochschild Homology and by Loday [1988], Hanlon [1990] in their development of cyclic homology. The idempotents of this algebra act naturally on a complex constructed from the usual bar resolution and, for commutative algebras, commute with the boundary maps. Hence their kernel and image offer a natural Hodgetype splitting of the associated homology. It would take us too far afield to explain the connections between the descent algebra, the freeLie algebra, and Philip Hall’s commutator calculus. Fortunately, this has been splendidly carried out by Garsia (1990) and GarsiaReutenauer (1989) as summarized by Reutenauer [1993]. This book contains a central chapter on shuffle algebras. It omits most of the topics discussed in the present review! A number of other appearances of shuffling are in the series of papers by Nantel Bergeron (with several sets of coauthors) listed in the bibliography. These extend previous results to more general Coxeter groups, include applications to Vassiliev invariants and much else.
B. Connections with Symmetric Function Theory The theory of symmetric functions as developed by Stanley (1972, 1999) and Macdonald (1985) has had a great unifying effect on combinatorics. Many seemingly isolated facts about balls in boxes, permutations and partitions are nowadays seen as formulae for change of basis. Schurs symmetric functions are at the heart of this theory. A charming discovery of Stanley (2001) developed by Fulman (2002) shows how Schur functions arise in a natural way from riffle shuffling. Let θ1, θ2, . . . be nonnegative numbers that sum to one. Drop n balls into a set of boxes with θi the chance of a ball dropping into box i. Suppose the box counts are N1, N2, . . . with N1 + N2 + . . . = n. Take a deck of n cards; cut off the top N1, cards, then the next N2 cards (forming a separate pile), etc. of course, many of the piles may be empty. Riffle shuffle these piles together as in Section 2a. This results in a final permutation w. Apply the Schensted map to w to get a pair of standard Youngtableaux of the same shape λ. Proposition The probability that the above procedure results in the partition λ is the Schur function sλ times the dimension fλ of the associated representation of the symmetric group: sλ(θ1, θ2, . . .)fλ. Stanley’s proof of this proposition uses quasisymmetric functions, an emerging tool in algebraic combinatorics. Fulman’s proof of the proposition uses only classical facts from symmetric function theory. Both authors develop corollaries and variations. One striking application to shuffling due to Fulman shows that the distribution of features of a permutation dependent on the shape of the associated Youngtableaux e.g. the longest increasing subsequence – have the correct limiting distribution after 5 6 log2 n shuffles. Stanley (1999) (2002) is a good place to start reading about quasisymmetric functions. Aguiar and Sottile (2002), Billera, Hsiao and Van Willigenburg (2001) and Garsia, Wallach (2002) are relevant, significant studies. All have shuffles as part of their combinatorial essence.
3C Work of Fulman Some profound connections between shuffling and the enumerative theory of finite groups of Lie type have been developed by Jason Fulman. Some of this has already made an appearance above in Sections 2B and 3B. This section describes some further developments. Yet others appear in the rich collection of papers listed in the bibliography. One striking result of Fulman explains a mystery. A main result in DiaconisMcGrathPitman (1995) is a closed formula for the cycle structure of a permutation after an ashuffle (see (2.1) in Section 2C). It was also observed that this formula answers a second question: pick a random monic degree n polynomial x n + an−1x n−1 + . . . + a0 with coefficients in Fq by choosing a0, a1, . . . , an−1 from the uniform distribution. Factor this polynomial into irreducible factors and suppose there are ni irreducibles of degree i. The chance of a given n1, n2, . . . , nn occurring is given by (2.1) with a = q. This was proved by observing that two formulae agreed – that is, without understanding. Fulman [1998] found a conceptual explanation and an extension to other groups and shuffling schemes. Fulman’s explanation begins with a simply connected, semisimple group G de fined over Fq. Let G be the Lie algebra. Consider the orbits of semisimple elements of G under the adjoint action of G. For example, for groups of type A, G = SLn(Fq), G = s`(n, q) and semisimple elements correspond to monic degree n polynomials with coefficient of x n−1 vanishing. For types A and B, Fulman shows that there is a natural map Φ from the semisimple orbits to the conjugacy classes of the Weyl group W such that a uniformly chosen orbit maps to the measure induced by ashuffling with a = q. Thus a randomly chosen polynomial maps to an ashuffle and the factors map to cycles. For shuffles of type B, the correspondence is with symmetric polynomials f(z) = f(−z) In algebraic group theory there is an analog of the map Φ which carries semisimple conjugacy classes of the group G to conjugacy classes of the Weyl group. Picking a semisimple class uniformly induces a probability distribution on conjugacy classes. Fulman [1997] managed to find a card shuffling interpretation of this map as well and give an enumerative theory that works for all split semisimple groups. His work uses results of Cellini and Carter’s work on the Brauer complex. Indeed, Carter (2002) has recently extended Fulman’s work to more general groups. We give the card shuffling version of Fulman’s work for type A. Define an Fshuffle of a deck of 2n cards as follows: choose an even number j, between 1 and 2n with probability 2n 2j /2 2n−1 . Remove the top j cards of the deck. Remove the bottom j cards of the deck and place them on top of the original top j cards to form a packet of size 2j. Shuffle this packet with the remaining 2n − 2j cards. Fulman derives remarkable closed form generating functions for the cycles of a permutation after an Fshuffle. He also shows that Fshuffles convolve nicely and, for special deck sizes, gives an alternate description in terms of a riffle followed by a cut. The analogous developments for type B yield closed formulae for the cycles of randomly chosen unimodal permutations. These arise in dynamical systems and in social choice theory. One further aspect of Fulman’s work deserves special mention (and followup!). The shuffling work in DiaconisMcGrathPitman (1995) leans on a remarkable bijection of Gessel between multisets of primitive necklaces and permutations with cycle structure equal to that of the necklace. Fulman shows that by refining the correspondence Φ described above to a map to the Weyl group (instead of just to conjugacy classes) one recovers Gessel’s bijection in a group theoretically natural way.
3C. Work of Lalley Steve Lalley has written a series of papers studying extensions of the basic GilbertShannonReeds model to less uniform methods of riffle shuffling. Even changing the method of cutting the deck in two from a fair binomial distribution to a skewed binomial distribution with parameter p < 1 2 destroys a basic symmetry. For this case, Lalley [2000] conjectures that there is a sharp threshold for the mixing time at Cp log n for Cp = (3 + θp)/ log(1/p2 + q 2 ) with θp the unique solution of p θ + q θ = (p 2 + q 2 ) 2 . Observe that C1 2 = 3 2 log2 n in agreement with Theorem 1. Lalley [2000] and Fulman (1998) give upper and lower bounds of this form for the mixing time but sharp results are conjectural. Lalley [1996], [1999] expands the basic interlacing mechanism underlying the GSR shuffle. To explain, recall the dynamical systems description of GSR shuffles as the permutation induced by n uniform points in [0, 1] under the baker’s transformation x 7−→ 2x mod(1). This results in all interleaving being equally likely. It is natural to consider more general maps f : [0, 1] → [0, 1] which preserve Lebesgue measure. Lalley works with piecewise C 2 maps which are piecewise monotone increasing. He shows that several interpretable shuffles can be so described. For example, the biased cut shuffles described above or shuffles where the left card is dropped with probability uA/(uA + wB) when packets are of size A, B, here u, w are fixed parameters. When u = w = 1 2 this becomes the original GSR shuffle. The main result of Lalley [1996] shows that when n is large, for fixed i, the number Ni of cycles of length i after an fshuffle are approximately independent geometric random variables with P(Ni = k) = (1 − w)w k the parameter w depends on i and on the map f in a simple way. Further, the Ni are approximately independent. The main result of Lalley [1999] gives a lower bound for the number of fshuffles required to mix N cards; at least h −1 log N shuffles are needed where h is the ’fiber entropy’ associated to f. The proofs are a marvelous mix of ergodictheoretic symbolic dynamics and combinatorics. One interesting aspect of these fshuffles is that, aside from ashuffles, the successive permutations chosen for repeated convolution are not independent. They form a stationary sequence. This is not necessarily bad; perhaps real shufflers remember a few steps back – if a particularly lumpy shuffle was just made the next shuffle might be neater. See also DubrowFill (1995). There is much to follow up from Lalley’s work. Perhaps the leading problem is to prove any kind of upper bound for fshuffles or better, to determine where cutoffs appear.
3D. Early Shuffling The basic combinatorial shuffling of two sequences, one with m letters x1, . . . , xm and one with n letters y1, . . . , yn, into the formal sum of sequences of n + m letters in all orders that preserve the order of the x’s and the order of the y’s (thus n+m m terms) appears in other areas of algebra. Perhaps earliest is the classical wedge product of two alternating forms. If V is a vector space and f : V m → R, g : V n → R are alternating multilinear functions, then f ∧ g : V n+m → R may be constructed as the function f ∧ g(x1, . . . , xn+m) = X σ sgn(σ)f(xσ1 , . . . , xσm)g(xσm+1 , . . . , xσn+m) where the sum is over all shuffles. A splendid account of this classical subject appears in Cartan (1967, pg. 179188). The shuffling construction guarantees that f ∧ g is alternating, that f ∧ g = (−1)mng ∧ f and that the wedge product is associative. Cartan’s proof of this last statement results from the following fact: with three packets of cards of sizes , m, n, shuffling
into m and then the n into this joint packet results in the same distribution as shuffling in any of the other orders, or indeed shuffling the 3 packets together simultaneously as in the 3shuffles described in Section 2d. More general shuffles appear when studying flag manifolds. A flag is an increasing sequence of subspaces. If the successive dimensions are n1, n1 +n2, . . . then shuffles based on cutting off packets of size n1, n2, . . . appear. In particular, such shuffles index a basis for the homology of the associated flag variety. See Fulton (1997) or Shahshahani (2002) for textbook descriptions. EilenbergMacLane (see MacLane 1950) used the shuffle construction as a basic building block for constructing a chain complex giving an appropriate cohomology theory for Abelian groups. They get H2 (π, G) as the group of Abelian extensions of G by π. Shuffles appear frequently in other basic constructions in algebraic topology. For example, if X is a space with an associative, commutative product, Milgram (1967) defined a product on the classifying space B(X) using shuffles. This work was systematized by Steenrod (1967) and further by MacLane (1970). Shuffles appear in the EilenbergZilber Theorem and in explicit proofs of the K¨unneth formula giving a chain equivalence between a chain complex for the product of two spaces and the tensor product of the two chain complexes. See Hatcher (2002, pg. 278) for details and Dupont (2001, pg. 29) for a charming appearance in the world of scissors congruences! The essence of much of this is that the shuffling map gives a natural triangulation of the product of two simplices. ¿From a modern view, many of these appearances of shuffling occur because of the many natural Hopf algebras in mathematics. See SchneiderSternberg (1993) for references and pointers to Rees’ shuffle algebras and Chen’s iterated integrals. Perhaps even more basic, the permutatedron is the convex hull of all permutations of the vector (1, 2, 3, . . . , n) in Rn. It is a convex polytope with vertices indexed by permutations. It may be seen that the edges and faces of various dimensions are indexed by shuffles. See Billera and Sarangarajan (1996) for a clear statement and proof. It would be marvelous if some of what we know about shuffling illuminates these applications or vice versa. 4. Some Generalizations There are a bewildering variety of extensions of riffle shuffling where much of the successful analysis goes through. It is easiest to lead into this by considering inverse riffle shuffles where a subset is selected at random and moved to the top. A natural generalization is to partition [n] into ordered blocks (B1, B2, . . . , Bk). Then remove all cards with labels in block one and move these to the top (keeping the cards within a block in their original relative order). Next cards with labels in B2 are removed and put directly below those in B1, and so on with cards having labels in block k finishing at the bottom. Let B be the space of all ordered blocks of any shape if a weight w(B), B ∈ B is specified with w(B) ≥ 0 Σw(B) = 1, then a random walk can proceed. Inverse riffle shuffles and the GSR model proceed from the uniform distribution on the set of 2n partitions into two blocks. A widely studied special case puts weights w1, w2, . . . , wn on each card and then removes card i to top. This arises as a method of rearranging files so that frequently called for items are near the top. See Fill [1996] for an extensive survey. Curiously, the special case with wi = 1/n for all i is central in Wallach (1986) and GarsiaWallach (2002). As will emerge, there is a relatively complete theory for this class of walks – a description of stationary distribution, reasonable rates of convergence and a complete description of the associated eigenvalues. This will follow from the following sweeping generalization. A. Hyperplane Walks BidigareHanlonRockmore (1999) introduced a class of walks on chambers of a hyperplane arrangement which includes the walks above as a special case. Their works was completed in various ways by Bidigare (1997), Brown [2000, 2001], BrownDiaconis [1998]. BilleraBrownDiaconis (1999) offer an introduction. The story begins with a set A of affine hyperplanes in Rd . This cuts Rd into regions called chambers. These chambers are polyhedra with sides called the faces of the arrangement. For example, three lines in the plane in general position yield 7 chambers (2dimension), 9 one dimensional faces and three zerodimensional faces (the three points of intersection). Given a face F and a chamber C, the projection of F on C, written F C, is the unique chamber with F as a face and closest to C. Here closeness if measured by the number of hyperplanes in A one must cross in moving from C to F C. Let wF ≥ 0 ΣwF = 1 be weights on the faces of the arrangement A. Define a random walk on the set of chambers by moving from C to F C when F is chosen with probability wF . The theory depends on the lattice L of all possible intersections of elements in A. Here are the main theorems of BidigareHanlonRockmore [1999], BrownDiaconis [1998]. Theorem 1 Let A be a hyperplane arrangement in Rd . Let L be the intersection lattice of A and wF a probability measure on the faces. Then, the transition matrix of the Markov chain is diagonalizable. For each W ∈ L there is an eigenvalue λW = X F ≤W wF with multiplicity mW = µ(W, V ) = (−1)dim(W,V )µ(W, V ) where µ is the M¨obius function of L. Theorem 2 (a) The Markov chain of Theorem 1 has a unique stationary distribution π if and only if for each H ∈ A there is a face F not in H with wF > 0. (b) The stationary distribution in (a) can be described by sampling faces without replacement from wF to get an ordering F1, F2, . . .. Then, for any chamber C0, the product F1F2F3 . . . C0 is a chamber distributed from π. (c) For π as in (a), (b), and starting chamber C0 kKC0 − πk ≤ X H∈A λ
H To complete this section, let us show how these hyperplane walks extend riffle shuffles. The braid arrangement Ad consists of hyperplanes Hij = {x ∈ Rd : xi = xj}. All points within the same chamber have the same relative order so the chambers may be labeled with permutations. The faces are points in Rd which lie on some of the Hij and on various sides of the rest. These may be labeled by block ordered partitions (B1, B2, . . . , Bk) of [n]. Finally, the action F C of a block ordered partition on the permutation corresponding to C results from removing cards from the first block and moving to the top, etc., as described in the introduction to this section. The present description does not do justice to the wealth of examples of hyperplane arrangements where the chambers have natural names and the walk has a natural interpretation. We can only hope that the reader will consult the references above. B. Some Representation Theory I want to describe work of BergeronBergeronGarsia (1989), Hanlon [1990] and UyemuraReyes (2002) which shows a deep interplay between the shuffling schemes of Section A and the representation theory of the symmetric group. To keep things manageable, consider random walks on the braid arrangement driven by invariant face weights: w(F) = w(σF). This includes (uniform) random to top and inverse riffle shuffles as special cases. Let Q(σ) = ΣF id=σw(F). These walks may be described via repeated convolution by the probability measure Q. It is natural to ask how the eigenvalues of the walk split up by representation. Recall that the irreducible representations of Sn are indexed by partitions ν of n. If ρν(σ) is the associated matrix representation, we are asking about the eigenvalues of the matrix Qb(ν) = ΣσQ(σ)ρν(σ). By general theory (Diaconis (1988, Chapter 3E)) these are a subset of the eigenvalues from Theorem 1 in Section 4A Above. For the braid arrangement, the eigenvalues are indexed by block ordered partitions. However, because of the symmetry w(F) = w(σF), the eigenvalues only depend on the underlying number partition. Thus for each pair of partitions (µ, ν) we may ask how many times the eigenvalue λµ occurs in the matrix Qb(ρν). To describe the answer we need both the usual irreducible characters χν of Sn and the Lie characters ψµ (Reutenauer (1993, Chapter 8)). These Lie characters may be described by taking a permutation of cycle type µ in Sn. Its centralizer is a product of Wreath products SkwrCj . Take a ξ primitive jth root of 1, consider the one dimensional character of C k j which takes xj , . . . , xn to ξ x1+…+xk . This induces a one dimensional character of the Wreath Product. Taking a product of these 1dimensional characters over all factors in the centralizer and then inducing up from the centralizer to Sn gives ψµ. The main theorem below was proved by Hanlon [1990] for the case of GSR shuffles. Richard Stanley (personal communication) conjectured the general result which was proved by UyemuraReyes (2002). Theorem 3 For an Sn invariant hyperplane walk on the braid arrangement the multiplicity of the eigenvalue λµ of Theorem 1 in the ν th irreducible representation of Sn equals hχν, ψµi. Remarks (a) Lie characters have been extensively investigated when µ = (n), see Stembridge (1989), where an explicit decomposition formula is given. For general partitions µ, much less is known. Theorem three shows that the Sn invariant shuffles are equivalent objects in the group algebra. Any such shuffle is a linear combination of what may be called µ shuffles as described in the introduction to this section. As shown in DiaconisFillPitman [1992, Sec. 5], these µ shuffles form a basis for the descent algebra. (b) UyemuraReyes (2002) shows how the numbers described above allow bounds on how many shuffles of a given type are required to randomize a subset of cards, e.g. the original top card or top 13 cards. Here is one example. If k = log2 (n/c), after k inverse GSR shuffles, let Qk be the probability distribution of the position of the original top card. Then Qk is close to uniform if c is small: kQk − uk ≤ 1 − (1 − 2 −k ) n (c) These connections to representation theory are crucially used in Fulman (2000B) to get nice formulae for the cycle structure of shuffles followed by a cut. C. Brown’s Semigroup Walks Ken Brown [2000, 2001] has given a marvelous extension of the hyperplane walks which leads to interesting special cases and a conceptual explanation of why the eigenvalues of these nonsymmetric Markov chains are nonnegative real numbers. The brief treatment given here is a shuffling together of two of Brown’s papers and the reader is strongly encouraged to read the originals. Let S be a finite semigroup satisfying x 2 = x for all x ∈ S. A random walk is driven by a probability distribution w(x), x ∈ S. At each stage, one picks x from w(x) and multiplies on the left. Thus the transition matrix is K(s, t) = X x·s=t w(x) In all the examples, the state space of the walk is restricted to a left ideal I in S. Example 1. Hyperplane Walks. Let S be the set of faces of a hyperplane arrangement with I the set of chambers under the product of Section 4A. This product is idempotent and the results of Section 4A will be seen as special cases of the main theorem below. Example 2. q analogs Let MAT(n, , q) be the set of n ×
matrices of rank with coefficients in Fq. Let S = ∪ n
=1MAT(n, , q) and I = GLn(q) = MAT(n, n, q). Define a product on S as follows: If s has columns (s1, . . . . , s
) and t has columns (t1, . . . , tm) form s · t by appending the columns of t to the columns of s in order t1, t2, . . . deleting a ti if it is linearly dependent on the columns already there. This is an idempotent associative product and GLn(q) is an ideal. The “q = 1 case” consists of ordered strings from 1, 2, . . . n, without repeated values and the ideal becomes the symmetric group Sn. Thus if s = (3, 5) and t = (23145) st = 35214 and we see the move to the front chain. Example 3. The free idempotent semigroup Fn on 1, 2, . . . n, may be described as the equivalence class of finite strings under the equivalence relations w 2 = w for all subwords. For example, when n = 2, we get the six strings S = {1, 2, 12, 21, 121, 212} Brown (following Green and Reees (1952)) shows that F3 has order 159 and Fn has order Pn i=1 n i Qi j=1(i − j + 1)2 j . Let I be the ideal of all words having each of {1, 2, . . . , n} appearing at least once (for n = 2, I = {12, 21, 121, 212}). Any probability measure on S induces a Markov chain on I by left multiplication. Return now to the general case of an idempotent semigroup S. Brown introduces a support map supp : S → L with L an explicitly constructed semilattice. The support map is a subjection satisfying supp (xy) = supp x∨ supp y and supp x ≥ supp y if and only if x = xyx. The set L indexes the eigenvalues of the walk. For hyperplane walks, L is the intersection lattice. For matrices, L is the subspace spanned by the columns. For the free idempotent semigroup L is the collection of subsets of {1, 2, . . . , n} under union. The natural ideal I is the two sided ideal. {x : supp x = b1}. This specializes to the ideals given in the three examples above. Brown gives a version of theorems one and two of Section 4A: For each X ∈ L there is an eigenvalue λX = P supp x≤X w(x) with a neat way of computing multiplicities. If the product of x with wx 6= 0 is in I then there is a unique stationary distribution π which may be described as the distribution of the random element x1x2 . . . c0 with x1, x2, . . . sampled without replacement from w(x). Finally, for any starting state C0 ∈ I, kKC0 − πk ≤ X H λ ` H where H ranges over the maximal elements of L. The key to the analysis is a surprising, complete character theory. (Most semigroups do not have a reasonable character theory.) Brown shows that all representations of S are one dimensional and that the representations are indexed by L; the ’Fourier transform’ of the random walk now yields the eigenvalues. One aspect of Theorem 1 that needn’t go through: the Markov chain needn’t be diagonalizable. To help the reader navigate, Brown first worked in idempotent semigroups satisfying the additional identity xyx = xy. These are called left regular bands in the semigroup literature; most of the examples considered above are left regular bands. Under this condition the chain is diagonalizable. In later work, Brown showed that nearly everything goes through in the general case. There is a tantalizing extension to a walk on the chambers of a building. Here, while a product is well defined, it is not associative. This creates a mess but there are some positive results as well. 5. Some Open Problems 1. Almost none of the walks presented here have good lower bounds available. Examples include riffle shuffles with the deck cut exactly in two (see Section 3A) or any of Fulman’s shuffles (Section 3C). It would be nice to have a lower bound in some generality for the general hyperplane walks of Section 4A. Usually, reasonable lower bounds are easier to prove than upper bounds. See [Diaconis, 1988] or [SaloffCoste 1997] for the usual techniques. One idea for a systematic approach: Brown’s method (Section 4C) finds a representation theoretic interpretation. With characters available, perhaps David Wilson’s [2001] approach may be pushed through. 2. It should be the case that essentially all the walks discussed here show a sharp cutoff in their approach to stationarity; proving this requires sharp upper bounds as well as sharp lower bounds. The general upper bounds (e.g., Theorem 2 of Section 4A) are often slightly off in the few cases where sharp answers are known. For example, for ordinary riffle shuffles, the general approach shows 2 log2 n + C shuffles suffice for randomness while Theorem 1 of Section 2A shows the right answer is 3 2 log2 n + C. The original paper of BidigareHanlonRockmore gives a potentially sharper upper bound. It would be very instructive to compare the two variations. In preliminary work, BrownDiaconis [1998] found them similar but UyemuraReyes [2002] found examples where the BHR bound is a genuine improvement. It may be that the bounds of BHR or Theorem 2 of Section 4A are sharp for some other metric; this happens for ordinary riffle shuffles with separation distance as discussed in Section 2C. At a more abstract level, it may be possible to prove the existence of a sharp threshold without being able to locate it along the line of concentration inequalities see Ledoux [2000].
 A very clear set of problems is to give any kind of upper bounds for Lalley’s fshuffles of Section 3C. Presumably, these all mix n cards in order log n steps but at present we don’t know that order 2n steps suffice.
For practical reasons it is natural to seek models of riffle shuffling cards that result in neater shuffles than the GSR shuffles. This arises in studying the way Las Vegas dealers shuffle; they drop cards in close to perfect alternation while the GSR method has packet sizes geometrically distributed. Here is a suggestion whose analysis is completely open: the Markovian Model is driven by a 2state Markov chain with transition matrix. p00 p01 p10 p11! To shuffle a deck of n cards, run the chain starting in stationarity to produce x1, x2, . . . , xn a binary sequence. If this sequence has k zeros and n − k ones, cut off the top k cards as a left hand pile, the n−k remaining as a right hand pile. Use the zeros and ones (from right to left say) to dictate if the next drop is from left or right. For example, with n = 10 cards the sequence 0101100110 results in 5 cards being cut off and the final arrangement 1, 6, 2, 7, 8, 3, 4, 9, 10, 5. This includes the GSR model by taking pij = 1/2 and perfect shuffles by taking p01 = p10 = 1. It is natural to begin with a symmetric cut, so p01 = p10 and p00 = p11. There is every hope that this model will produce neat and useful analyses. It must be the case that (for symmetric shuffles with 0 < p01 < 1) there is a sharp threshold at θ log n + C with θ = θ(p). For practical purposes one could estimate θ(p) from computer experiments and also estimate p by watching dealers shuffle. This would allow one to derive reasonable ways of exploiting the structure if the dealers do not shuffle enough. The following seems clear: Since pij = 1/2 requires seven shuffles and this is the fastest method, most neat shufflers will require a good many more shuffles and there will be plenty of structure to take advantage of! Incidentally, the case of ’random perfect shuffles’, each time choosing randomly to do a perfect in or out shuffle, has been analyzed by UyemuraReyes (2002). For perfect shuffles, not all permutations are possible so the walk is random on a subgroup. For decks of size 2k , he shows order k 2 steps are necessary and suffice. The case of general decks remains open.
I want to record empirical work which yields a quite different natural model for riffle shuffling. In joint work with my student Arnab Chakraborty we studied commonly available machines for shuffling cards. These machines have the user cut the deck in two halves (placed into the left and right sides of the machine). Then a button is pushed which activates rubber wheels touching the bottoms of the two packets. These spin cards off the bottoms into a central region where they drop onto a collecting place. At the end of one shuffle the user retrieves the deck, cuts it into two and the process continues. In our empirical work we found that an “opposite” GSR model seemed to fit the data. In the GSR model, if at some stage there are A cards in the left half and B cards in the right half, the chance of dropping the next card from the left half is A/(A + B). In the mechanical shuffler, the chance seemed to be B/(A + B); it was more likely for a card to be dropped from the smaller half. Of course, once all the cards in a half are used up, the remaining cards are dropped on top. This seems like a natural candidate for careful study. One may interpolate between models with a oneparameter family of the following form. If at some stage there are A cards in the left half and B cards in the right half, the chance of dropping the next card from the left is AθB1−θ/(AθB1−θ + BθA1−θ ). Again, for practical purposes θ could be fit from data and a cutoff parameter could be estimated by simulation.
The GSR is the most uniform single riffle shuffle in the sense that, given the cut, it makes all shuffles equally likely. It is not clear (though it seems plausible) that it is also the probability on shuffles so that Q ∗ Q is closest to uniform (as well as Q∗k for all k). This may be a simple problem but it seems worth clarifying. A similar case that would shed some light: on Z/mZ (the integers mod m), consider all probability measures with support in [−a, a]. Find the probability P in this set such that P ∗k is closest to uniform (say in entropy or total variation distance). Is this uniform?
A beautiful set of conjectures has arisen from thesis work of J.C. UyemuraReyes. To describe them, consider first the random to top shuffle. This has eigenvalues 0, 1/n, 2/n, . . .(n − 2)/n, 1 with multiplicity of j/n the number of permutations in Sn with j fixed points. This was proved by Phatarphod and independently by Wallach (1986) and follows from Theorem 1 of Section 4A. Next consider the multiplicative reversibilization of random to top. This is random to top followed by top to random: It may also be described as: remove a random card and insert it in a random position. Numerical work shows that the eigenvalues are all of the form quadratic function of (j)/n2 . For some cases this can be proved. For example, zero occurs with multiplicity the number of derrangements and the eigenvalues in representations near the trivial representation (or alternating representation) can be proved of this form. There must be a way to understand these! Work of Phil Hanlon and Patricia Hersh indicates that this question fits very neatly into algebra, along the lines of Section 3A. Using the available results one may conjecture where the cutoff occurs for mixing. For either random to top or top to random, n log n is the cutoff. It seems that random to random must be faster but perhaps not by more than a factor of two. We conjecture that 3/4n log n is the cutoff here. This is what is required to kill the eigenvalues from the n − 1 dimensional representation. UyemuraReyes [2002] proves a lower bound of form 1 2 n log n and an upper bound of 4n log n. At present writing we do not know that n 2× eigenvalue is an integer. Again, in preliminary work, it seems as if the eigenvalues of the multiplicative symmetrization of any hyperplane walk from a Coxeter group with symmetric face weights generated by a finite reflection group will be “nice” in the same sense. To be specific, consider the permutation group Sn. Fix a composition µ = (µ1, µ2, . . . , µr) of n. A symmetric µ shuffle removes a uniformly chosen subset of µ1 cards (keeping them in their same relative order, then, from the remaining n − µ1 cards, a random subset of size µ2, and so on. With a final packet of size µr. These r packets are shuffled together by a GSR shuffle.
A very simple to state conjecture: After k GSR shuffles of n cards consider turning up cards from the top one at a time. What is the optimal guessing strategy to maximize the expected number of correct guesses? A conjectured optimal strategy due to McGrath is described in BayerDiaconis [1992]. There is related work in Ciucu (1998). Prove that McGrath’s strategy is optimal. An easier version (still open) asks the same question following k top to random shuffles with k fixed and known.
Less of a conjecture than a suggestion; many of the semigroup walks in Brown [2000] seem worthwhile studying in depth. To take one example; Brown [2000, Section 6.3] introduced a fascinating family of walks on phylogenetic trees. Walks on such trees are currently an active area of study. See DiaconisHolmes [2002] for pointers to work by Aldous and to the currently very active work in biology. Brown’s walks are driven by weights wij . A first natural problem is to study the stationary distribution of Brown’s walk as a natural family of nonuniform distributions on trees. They carry over to trees the Luce model which has been very actively studied for permutations. One might even contemplate estimating Brown’s parameters wij from data. It is also natural to carry out some careful analyses of rates of convergence for natural families of weights: randomly chosen i.i.d. uniform weights, Zipf type weights, or wij the distance from i to j in some natural geometric structure.
A most annoying problem: find some use for the eigenvalues of the many walks in the section above. For reversible Markov chains there are good bounds on the rate of convergence based on eigenvalues. Are there any explicit bounds on, e.g., L 2 distances for nonsymmetric chains? Going further, are there bounds on the multiplicative symmetrization of a chain based on knowledge of the eigenvalues of the original chain? This would allow the wealth of eigenvalue information reported above to be used for comparison purposes as explained in SaloffCoste [1997].