Python Data Types

The operation of a Python program hinges on the data it handles. Data values in Python are known as objects; each object, AKA value, has a type. An object’s type determines which operations the object supports (in other words, which operations you can perform on the value). The type also determines the object’s attributes and items (if any) and whether the object can be altered. An object that can be altered is known as a mutable object, while one that cannot be altered is an immutable object.

The built-in type(obj) accepts any object as its argument and returns the type object that is the type of obj. The built-in function isinstance(obj, type) returns True when object obj has type type (or any subclass thereof); otherwise, it returns False.

Python has built-in types for fundamental data types such as numbers, strings, tuples, lists, dictionaries, and sets, as covered in the following sections. You can also create user-defined types, known as classes.

Numbers

The built-in numeric types in Python include integers (int and long, in v2; in v3, there’s no distinction between kinds of integers), floating-point numbers, and complex numbers. The standard library also offers decimal floating-point numbers, and fractions. All numbers in Python are immutable objects; therefore, when you perform an operation on a number object, you produce a new number object.

Numeric literals do not include a sign: a leading + or -, if present, is a separate operator.

Integer numbers

Integer literals can be decimal, binary, octal, or hexadecimal. A decimal literal is a sequence of digits in which the first digit is nonzero. A binary literal is 0b followed by a sequence of binary digits (0 or 1). An octal literal, in v2 only, can be 0 followed by a sequence of octal digits (0 to 7). This syntax can be quite misleading for the reader, and we do not recommend it; rather, use 0o followed by a sequence of octal digits, which works in both v2 and v3 and does not risk misleading the reader. A hexadecimal literal is 0x followed by a sequence of hexadecimal digits (0 to 9 and A to F, in either upper- or lowercase). For example:

1, 23, 3493          # Decimal integer literals
0b010101, 0b110010   # Binary integer literals
0o1, 0o27, 0o6645    # Octal integer literals
0x1, 0x17, 0xDA5     # Hexadecimal integer literals

Integer literals have no defined upper bound (in v2 only, if greater than sys.maxint, integer literals are instances of built-in type long; v3 does not draw that distinction, but rather uses int as the type of all integers).

Floating-point numbers

A floating-point literal is a sequence of decimal digits that includes a decimal point (.), an exponent suffix (an e or E, optionally followed by + or -, followed by one or more digits), or both. The leading character of a floating-point literal cannot be e or E; it may be any digit or a period (.). For example:

0., 0.0, .0, 1., 1.0, 1e0, 1.e0, 1.0e0 # Floating-point literals

A Python floating-point value corresponds to a C double and shares its limits of range and precision, typically 53 bits of precision on modern platforms. (For the exact range and precision of floating-point values on the current platform, see sys.float_info: we do not cover that in this book—see the online docs.)

Complex numbers

A complex number is made up of two floating-point values, one each for the real and imaginary parts. You can access the parts of a complex object z as read-only attributes z.real and z.imag. You can specify an imaginary literal as a floatingpoint or decimal literal followed by a j or J:

0j, 0.j, 0.0j, .0j, 1j, 1.j, 1.0j, 1e0j, 1.e0j, 1.0e0j

The j at the end of the literal indicates the square root of -1, as commonly used in electrical engineering (some other disciplines use i for this purpose, but Python has chosen j). There are no other complex literals. To denote any constant complex number, add or subtract a floating-point (or integer) literal and an imaginary one. For example, to denote the complex number that equals one, use expressions like 1+0j or 1.0+0.0j. Python performs the addition or subtraction at compile time.

New in 3.6: Underscores in numeric literals

To assist visual assessment of the magnitude of a number, from 3.6 onward numeric literals can include single underscore (_) characters between digits or after any base specifier. As this implies, not only decimal numeric constants can benefit from this new notational freedom:

>>> 100_000.000_0001, 0x_FF_FF, 0o7_777, 0b_1010_1010
(100000.0000001, 65535, 4095, 170)

Sequences

A sequence is an ordered container of items, indexed by integers. Python has builtin sequence types known as strings (bytes and Unicode), tuples, and lists. Library and extension modules provide other sequence types, and you can write yet others yourself. You can manipulate sequences in a variety of ways.

Iterables

A Python concept that generalizes the idea of “sequence” is that of iterables. All sequences are iterable: whenever we say you can use an iterable, you can in particular use a sequence (for example, a list).

Also, when we say that you can use an iterable, we mean, usually, a bounded iterable: an iterable that eventually stops yielding items. All sequences are bounded. Iterables, in general, can be unbounded, but if you try to use an unbounded iterable without special precautions, you could produce a program that never terminates, or one that exhausts all available memory.

Strings

A built-in string object (bytes or Unicode) is a sequence of characters used to store and represent text-based information (byte strings, also known as byte objects, store and represent arbitrary sequences of binary bytes). Strings in Python are immutable: when you perform an operation on strings, you always produce a new string object, rather than mutating an existing string. String objects provide many methods.

Different string types in v2 and v3

In v2, unadorned string literals denote byte strings; such literals denote Unicode (AKA text) strings in v3.

A string literal can be quoted or triple-quoted. A quoted string is a sequence of 0+ characters within matching quotes, single (') or double ("). For example:

'This is a literal string'
"This is another string"

The two different kinds of quotes function identically; having both lets you include one kind of quote inside of a string specified with the other kind, with no need to escape quote characters with the backslash character (\):

'I\'m a Python fanatic' # a quote can be escaped
"I'm a Python fanatic"  # this way is more readable

Other things equal, using single quotes to denote string literals is better Python style. To have a string literal span multiple physical lines, you can use a \ as the last character of a line to indicate that the next line is a continuation:

'A not very long string \
that spans two lines'       # comment not allowed on previous line

To make the string contain two lines, you can embed a newline in the string:

'A not very long string\n\
that prints on two lines'   # comment not allowed on previous line

A better approach is to use a triple-quoted string, enclosed by matching triplets of quote characters (''' or, more commonly, """):

"""An even bigger
string that spans
three lines"""           # comments not allowed on previous lines

In a triple-quoted string literal, line breaks in the literal remain as newline characters in the resulting string object. You can start a triple-quoted literal with a backslash immediately followed by a newline, to avoid having the first line of the literal string’s content at a different indentation level from the rest. For example:

the_text = """\
First line
Second line
"""               # like 'First line\nSecond line\n' but more readable

The only character that cannot be part of a triple-quoted string is an unescaped backslash, while a quoted string cannot contain unescaped backslashes, nor line ends, nor the quote character that encloses it. The backslash character starts an escape sequence, which lets you introduce any character in either kind of string. We list Python’s string escape sequences in follow table.

Sequence	Meaning	ASCII/ISO code
\<newline>	Ignore end of line	None
\\	Backslash	0x5c
\'	Single quote	0x27
\"	Double quote	0x22
\a	Bell	0x07
\b	Backspace	0x08
\f	Form feed	0x0c
\n	Newline	0x0a
\r	Carriage return	0x0d
\t	Tab	0x09
\v	Vertical tab	0x0b
\ DDD	Octal value DDD	As given
\x XX	Hexadecimal value XX	As given
\ other	Any other character: a two-character string	0x5c + as given

A variant of a string literal is a raw string. The syntax is the same as for quoted or triple-quoted string literals, except that an r or R immediately precedes the leading quote. In raw strings, escape sequences are not interpreted as in the table, but are literally copied into the string, including backslashes and newline characters. Raw string syntax is handy for strings that include many backslashes, especially regular expression patterns. A raw string cannot end with an odd number of backslashes: the last one would be taken as escaping the terminating quote.

In Unicode string literals you can use \u followed by four hex digits, and \U followed by eight hex digits, to denote Unicode characters, and can also include the same escape sequences listed in previous table. Unicode literals can also include the escape sequence \N{name}, where name is a standard Unicode name, as listed at http://www.unicode.org/charts/. For example, \N{Copyright Sign} indicates a Unicode copyright sign character (©).

Raw Unicode string literals in v2 start with ur, not ru; raw byte string literals in v2 start with br, not rb (in v3, you can start them with either br or rb).

Raw strings are not a diﬀerent type from other strings

Raw strings are not a different type from ordinary strings; they are just an alternative syntax for literals of the usual two string types, byte strings and Unicode.

New in 3.6, formatted string literals let you inject formatted expressions into your strings, which are therefore no longer constants but subject to evaluation at execution time. From a syntactic point of view, they can be regarded just as another kind of string literal.

Multiple string literals of any kind — quoted, triple-quoted, raw, bytes, formatted, Unicode — can be adjacent, with optional whitespace in between (except that, in v3, you cannot mix bytes and Unicode in this way). The compiler concatenates such adjacent string literals into a single string object. In v2, if any literal in the concatenation is Unicode, the whole result is Unicode. Writing a long string literal in this way lets you present it readably across multiple physical lines and gives you an opportunity to insert comments about parts of the string. For example:

marypop = ('supercalifragilistic'  # Open paren->logical line continues
           'expialidocious')       # Indentation ignored in continuation

The string assigned to marypop is a single word of 34 characters.

Tuples

A tuple is an immutable ordered sequence of items. The items of a tuple are arbitrary objects and may be of different types. You can use mutable objects (e.g., lists) as tuple items; however, best practice is to avoid tuples with mutable items.

To denote a tuple, use a series of expressions (the items of the tuple) separated by commas (,); if every item is a literal, the whole assembly is a tuple literal. You may optionally place a redundant comma after the last item. You may group tuple items within parentheses, but the parentheses are necessary only where the commas would otherwise have another meaning (e.g., in function calls), or to denote empty or nested tuples. A tuple with exactly two items is also known as a pair. To create a tuple of one item, add a comma to the end of the expression. To denote an empty tuple, use an empty pair of parentheses. Here are some tuple literals, all in the optional parentheses (the parentheses are not optional in the last case):

(100, 200, 300)       # Tuple with three items
(3.14,)               # Tuple with 1 item needs trailing comma
()                    # Empty tuple (parentheses NOT optional)

You can also call the built-in type tuple to create a tuple. For example:

tuple('wow')

This builds a tuple equal to that denoted by the tuple literal:

('w', 'o', 'w')

tuple() without arguments creates and returns an empty tuple, like (). When x is iterable, tuple(x) returns a tuple whose items are the same as those in x.

Lists

A list is a mutable ordered sequence of items. The items of a list are arbitrary objects and may be of different types. To denote a list, use a series of expressions (the items of the list) separated by commas (,), within brackets ([]); if every item is a literal, the whole assembly is a list literal. You may optionally place a redundant comma after the last item. To denote an empty list, use an empty pair of brackets. Here are some example list literals:

[42, 3.14, 'hello']      # List with three items
[100]                    # List with one item
[]                       # Empty list

You can also call the built-in type list to create a list. For example:

list('wow')

This builds a list equal to that denoted by the list literal:

['w', 'o', 'w']

list() without arguments creates and returns an empty list, like []. When x is iterable, list(x) creates and returns a new list whose items are the same as those x. You can also build lists with list comprehensions.

Sets

Python has two built-in set types, set and frozenset, to represent arbitrarily ordered collections of unique items. Items in a set may be of different types, but they must be hashable. Instances of type set are mutable, and thus, not hashable; instances of type frozenset are immutable and hashable. You can’t have a set whose items are sets, but you can have a set (or frozenset) whose items are frozensets. Sets and frozensets are not ordered.

To create a set, you can call the built-in type set with no argument (this means an empty set) or one argument that is iterable (this means a set whose items are those of the iterable). You can similarly build a frozenset by calling frozenset.

Alternatively, to denote a (nonfrozen, nonempty) set, use a series of expressions (the items of the set) separated by commas (,) and within braces ({}); if every item is a literal, the whole assembly is a set literal. You may optionally place a redundant comma after the last item. Some example sets (two literals, one not):

{42, 3.14, 'hello'}       # Literal for a set with three items
{100}                     # Literal for a set with one item
set()                     # Empty set (can't use {}—empty dict!)

You can also build nonfrozen sets with set comprehensions.

Dictionaries

A mapping is an arbitrary collection of objects indexed by nearly arbitrary values called keys. Mappings are mutable and, like sets but unlike sequences, are not (necessarily) ordered.

Python provides a single built-in mapping type: the dictionary type. Library and extension modules provide other mapping types, and you can write others yourself. Keys in a dictionary may be of different types, but they must be hashable. Values in a dictionary are arbitrary objects and may be of any type. An item in a dictionary is a key/value pair. You can think of a dictionary as an associative array (known in some other languages as an “unordered map,” “hash table,” or “hash”).

To denote a dictionary, you can use a series of colon-separated pairs of expressions (the pairs are the items of the dictionary) separated by commas (,) within braces ({}); if every expression is a literal, the whole assembly is a dict literal. You may optionally place a redundant comma after the last item. Each item in a dictionary is written as key:value, where key is an expression giving the item’s key and value is an expression giving the item’s value. If a key’s value appears more than once in a dictionary expression, only an arbitrary one of the items with that key is kept in the resulting dictionary object — dictionaries do not allow duplicate keys. To denote an empty dictionary, use an empty pair of braces.

Here are some dictionary literals:

{'x':42, 'y':3.14, 'z':7}     # Dictionary with three items, str keys
{1:2, 3:4}                    # Dictionary with two items, int keys
{1:'za', 'br':23}             # Dictionary with mixed key types
{}                            # Empty dictionary

You can also call the built-in type dict to create a dictionary in a way that, while usually less concise, can sometimes be more readable. For example, the dictionaries in the preceding snippet can equivalently be written as:

dict(x=42, y=3.14, z=7)      # Dictionary with three items, str keys
dict([(1, 2), (3, 4)])       # Dictionary with two items, int keys
dict([(1,'za'), ('br',23)])  # Dictionary with mixed key types
dict()                       # Empty dictionary

dict() without arguments creates and returns an empty dictionary, like {}. When the argument x to dict is a mapping, dict returns a new dictionary object with the same keys and values as x. When x is iterable, the items in x must be pairs, and dict(x) returns a dictionary whose items (key/value pairs) are the same as the items in x. If a key value appears more than once in x, only the last item from x with that key value is kept in the resulting dictionary.

When you call dict, in addition to, or instead of, the positional argument x, you may pass named arguments, each with the syntax name=value, where name is an identifier to use as an item’s key and value is an expression giving the item’s value. When you call dict and pass both a positional argument and one or more named arguments, if a key appears both in the positional argument and as a named argument, Python associates to that key the value given with the named argument (i.e., the named argument “wins”).

You can also create a dictionary by calling dict.fromkeys. The first argument is an iterable whose items become the keys of the dictionary; the second argument is the value that corresponds to each and every key (all keys initially map to the same value). If you omit the second argument, it defaults to None. For example:

dict.fromkeys('hello', 2)     # same as {'h':2, 'e':2, 'l':2, 'o':2}
dict.fromkeys([1, 2, 3])      # same as {1:None, 2:None, 3:None}

You can also build dicts with dict comprehensions.

None

The built-in None denotes a null object. None has no methods or other attributes. You can use None as a placeholder when you need a reference but you don’t care what object you refer to, or when you need to indicate that no object is there. Functions return None as their result unless they have specific return statements coded to return other values.

Callables

In Python, callable types are those whose instances support the function call operation. Functions are callable. Python provides several built-in functions and supports userdefined functions. Generators are also callable.

Types are also callable, as we already saw for the dict, list, set, and tuple built-in types. class objects (user-defined types) are also callable. Calling a type normally creates and returns a new instance of that type.

Other callables are methods, which are functions bound to class attributes, and instances of classes that supply a special method named __call__.

Boolean Values

Any data value in Python can be used as a truth value: true or false. Any nonzero number or nonempty container (e.g., string, tuple, list, set, or dictionary) is true. 0 (of any numeric type), None, and empty containers are false.

Beware using a ﬂoat as a truth value

Be careful about using a floating-point number as a truth value: that’s like comparing the number for exact equality with zero, and floating-point numbers should almost never be compared for exact equality.

The built-in type bool is a subclass of int. The only two values of type bool are True and False, which have string representations of 'True' and 'False', but also numerical values of 1 and 0, respectively. Several built-in functions return bool results, as do comparison operators.

You can call bool(x) with any x as the argument. The result is True when x is true and False when x is false. Good Python style is not to use such calls when they are redundant, as they most often are: always write if x:, never any of if bool(x):, if x is True, if x==True:, if bool(x)==True. However, you can use bool(x) to count the number of true items in a sequence. For example:

def count_trues(seq): return sum(bool(x) for x in seq)

In this example, the bool call ensures each item of seq is counted as 0 (if false) or 1 (if true), so count_trues is more general than sum(seq) would be.

When we write "expression is true", we mean that bool(expression) would return True.

Next And Prev

Next: Variables and Other References

Prev: Python Lexical Structure

Relate article

The Python Interpreter

Introduction to Python

Python Installation

Variables and Other References

Python Lexical Structure