## Table Tutorial

[Table](https://hail.is/docs/0.2/hail.Table.html) is Hail's distributed analogue of a data frame or SQL table. It will be familiar if you've used R or `pandas`, but `Table` differs in 3 important ways:

- It is distributed. Hail tables can store far more data than can fit on a single computer.
- It carries global fields.
- It is keyed.

A `Table` has two different kinds of fields:

- global fields
- row fields

### Importing and Reading

Hail can [import](https://hail.is/docs/0.2/methods/impex.html) data from many sources: TSV and CSV files, JSON files, FAM files, databases, Spark, etc. It can also *read* (and *write*) a native Hail format.

You can read a dataset with [hl.read_table](https://hail.is/docs/0.2/methods/impex.html#hail.methods.read_table). It take a path and returns a `Table`. `ht` stands for Hail Table.

We've provided a method to download and import [the MovieLens dataset](https://grouplens.org/datasets/movielens/100k/) of movie ratings in the Hail native format. Let's read it!

F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=https://dx.doi.org/10.1145/2827872.

In [None]:
import hail as hl
hl.init()

In [None]:
hl.utils.get_movie_lens('data/')

In [None]:
users = hl.read_table('data/users.ht')

### Exploring Tables

The [describe](https://hail.is/docs/0.2/hail.Table.html#hail.Table.describe) method prints the structure of a table: the fields and their types.

In [None]:
users.describe()

You can view the first few rows of the table using [show](https://hail.is/docs/0.2/hail.Table.html#hail.Table.show).

10 rows are displayed by default. Try changing the code in the cell below to `users.show(5)`.

In [None]:
users.show()

You can [count](https://hail.is/docs/0.2/hail.Table.html#hail.Table.count) the rows of a table.

In [None]:
users.count()

You can access fields of tables with the Python attribute notation `table.field`, or with index notation `table['field']`. The latter is useful when the field names are not valid Python identifiers (if a field name includes a space, for example).

In [None]:
users.occupation.describe()

In [None]:
users['occupation'].describe()

`users.occupation` and `users['occupation']` are [Hail Expressions](https://hail.is/docs/0.2/expressions.html)

Lets peak at their using `show`. Notice that the key is shown as well!

In [None]:
users.occupation.show()

### Exercise

The movie dataset has two other tables: `movies.ht` and `ratings.ht`. Load these tables and have a quick look around.