## Set Theory

Set theory is the study of sets or collections of items. This includes concepts and terms like sample space, mutually exclusive, disjoint, unions, events, intersections, etc…  and related symbols like $\cup$ or $\in$. Set theory is relevant in helping you understand probabilities, and specifically how to set up your problem so you can calculate the correct probability of an event.  This can be useful, especially during interviews, as a lot of brain teasers involve probability calculations. Set theory can also enlighten some concepts in other areas of statistics when we start to divide things into partitions. Finally, the notations used in set theory are prevalent throughout math and statistics.

With that said, set theory is pretty theoretical and can be hard to understand initially. Additionally, a lot of set theory just seems like a bunch of semantics. Because of this, in the grand scheme of things, outside of learning the symbols, set theory is not that important of a concept and you could potentially skip it.

The bulk of this page will cover symbols and terminology.  The terminology is a bit less important than the symbols. Note that within set theory, we make the distinction between a set and an event.  Depending on if something is viewed as a set or an event, the interpretation will slightly differ.  Interpreting things as events will be more relevant when we talk about probabilities.

## Terminology

Experiment: In statistics, an experiment, is a process in which outcomes can be determined ahead of time.  For example, an experiment could be flipping a coin and the outcomes could be heads or tails.

Event: An event is a set of possible outcomes of an experiment.  For example, different events when rolling a single die could be, rolling a 1, rolling an even number, rolling 5 or 6, etc…  The difference between an outcome and an event is that an event usually is a collection set of outcomes.

As mentioned before, there also is a difference between events and sets.  While both sets and events contain outcomes, an event usually implies that something can occur (and often has an associated probability), and a set is purely a list of outcomes. Note that events can contain sets and sets can contain events.

Sample Space: A sample space is all the possible outcomes of an experiment.  For example, the sample space for tossing 2 coins is: $S = [HH, HT, TH, TT]$.  By definition, a sample space is also either a set or an event. Note: Don’t let the word sample confuse you as it is not really related to “sample” when we talk about sample versus population.

Empty Set: An empty set is more of a theoretical concept and exists for conceptual reasons.  Similar to philosophy, where if there is being, then there is nothingness, in math, if there are sets, then there are empty sets. Previously, we talked about a set or space containing items, outcomes, or elements. An empty set is just a set that does not contain any elements or members and is denoted as $\emptyset$.  If we view an empty set as an even rather than a set, an empty set represents an event that is impossible or cannot occur. You may sometimes see an empty set referred to as a null set.

By definition $\emptyset \subset A$ where $A$ is any arbitrary event.

Complement: A complement of set, $A$,  is another set that contains all the elements that are not in the set $A$.  The notation for a complement of set $A$ is usually a superscript c, and thus the complement is $A^c$.  Just like most terms here, the interpretation for complement changes depending on if you view it as a set or an event.  When it comes to events, $A^c$ can be thought of as the event that $A$ does not occur.  When it comes to sets, as mentioned already, $A^c$ can be thought of as a set that includes all the items not in $A$.

For example, if we are talking about rolling a die, and we define $A$ as the event where we roll a 1 or 2, then $A^c$ is the event of rolling a 3, 4, 5, or 6.

It also goes without saying that the following properties hold:

$(A^c)^c) = A$

$\emptyset^c = S$ where $S$ is the sample space

$S^c = \emptyset$

$(A_1 \cup A_2)^c = A_1^c \cap A_2^c$

$(A_1 \cap A_2)^c = A_1^c \cup A_2^c$

Union: A union of two sets or events is denoted as $\cup$.  If $A_1$ and $A_2$ are sets, we can think of $A_1 \cup A_2$ as the combination of the items within the two sets (without any repeats – i.e. we eliminate double counting).  For example if:

$A_1 = [1, 2, 3, 4]$

$A_2 = [3, 4, 5]$