4.7. Sequence Unpack Slice

4.7.1. Rationale

  • Slice argument must be int (positive, negative or zero)

  • Positive Index starts with 0

  • Negative index starts with -1

4.7.2. Slice Forwards

  • sequence[start:stop]

>>> data = 'abcde'
>>> data[0:3]
'abc'
>>> data = 'abcde'
>>> data[2:5]
'cde'

4.7.3. Slice Defaults

  • sequence[start:stop]

  • start defaults to 0

  • stop defaults to len(sequence)

>>> data = 'abcde'
>>> data[:3]
'abc'
>>> data = 'abcde'
>>> data[3:]
'de'
>>> data = 'abcde'
>>> data[:]
'abcde'

4.7.4. Slice Backwards

  • Negative index starts from the end and go right to left

>>> data = 'abcde'
>>> data[-3:-1]
'cd'
>>> data = 'abcde'
>>> data[-3:]
'cde'
>>> data = 'abcde'
>>> data[0:-3]
'ab'
>>> data = 'abcde'
>>> data[:-3]
'ab'
>>> data = 'abcde'
>>> data[-3:0]
''

4.7.5. Step Forward

  • Every n-th element

  • sequence[start:stop:step]

  • start defaults to 0

  • stop defaults to len(sequence)

  • step defaults to 1

>>> data = 'abcde'
>>> data[::1]
'abcde'
>>> data = 'abcde'
>>> data[::2]
'ace'
>>> data = 'abcde'
>>> data[::3]
'ad'
>>> data = 'abcde'
>>> data[1:4:2]
'bd'

4.7.6. Step Backward

  • Every n-th element

  • sequence[start:stop:step]

  • start defaults to 0

  • stop defaults to len(sequence)

  • step defaults to 1

>>> data = 'abcde'
>>> data[::-1]
'edcba'
>>> data = 'abcde'
>>> data[::-2]
'eca'
>>> data = 'abcde'
>>> data[::-3]
'eb'
>>> data = 'abcde'
>>> data[4:1:-2]
'ec'

4.7.7. Slice Errors

>>> data = 'abcde'
>>> data[::0]
Traceback (most recent call last):
ValueError: slice step cannot be zero
>>> data = 'abcde'
>>> data[::1.0]
Traceback (most recent call last):
TypeError: slice indices must be integers or None or have an __index__ method

4.7.8. Out of Range

>>> data = 'abcde'
>>> data[:100]
'abcde'
>>> data = 'abcde'
>>> data[100:]
''

4.7.9. Slice str

>>> data = 'abcde'
>>>
>>>
>>> data[0:3]
'abc'
>>> data[3:5]
'de'
>>> data[:3]
'abc'
>>> data[3:]
'de'
>>> data[::1]
'abcde'
>>> data[::-1]
'edcba'
>>> data[::2]
'ace'
>>> data[::-2]
'eca'
>>> data[1::2]
'bd'
>>> data[1:4:2]
'bd'

4.7.10. Slice tuple

>>> data = ('a', 'b', 'c', 'd', 'e')
>>>
>>>
>>> data[0:3]
('a', 'b', 'c')
>>> data[3:5]
('d', 'e')
>>> data[:3]
('a', 'b', 'c')
>>> data[3:]
('d', 'e')
>>> data[::2]
('a', 'c', 'e')
>>> data[::-1]
('e', 'd', 'c', 'b', 'a')
>>> data[1::2]
('b', 'd')
>>> data[1:4:2]
('b', 'd')

4.7.11. Slice list

>>> data = ['a', 'b', 'c', 'd', 'e']
>>>
>>>
>>> data[0:3]
['a', 'b', 'c']
>>> data[3:5]
['d', 'e']
>>> data[:3]
['a', 'b', 'c']
>>> data[3:]
['d', 'e']
>>> data[::2]
['a', 'c', 'e']
>>> data[::-1]
['e', 'd', 'c', 'b', 'a']
>>> data[1::2]
['b', 'd']
>>> data[1:4:2]
['b', 'd']

4.7.12. Slice set

Slicing set is not possible:

>>> data = {'a', 'b', 'c', 'd', 'e'}
>>>
>>> data[:3]
Traceback (most recent call last):
TypeError: 'set' object is not subscriptable

4.7.13. Nested Sequences

>>> DATA = [
...     ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...     (5.8, 2.7, 5.1, 1.9, 'virginica'),
...     (5.1, 3.5, 1.4, 0.2, 'setosa'),
...     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...     (6.3, 2.9, 5.6, 1.8, 'virginica'),
...     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...     (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>>
>>> DATA[1:]  
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
 (5.1, 3.5, 1.4, 0.2, 'setosa'),
 (5.7, 2.8, 4.1, 1.3, 'versicolor'),
 (6.3, 2.9, 5.6, 1.8, 'virginica'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor'),
 (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>> DATA[-3:]  
[(6.3, 2.9, 5.6, 1.8, 'virginica'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor'),
 (4.7, 3.2, 1.3, 0.2, 'setosa')]

4.7.14. Column Selection

Column selection unfortunately does not work on list:

>>> data = [[1, 2, 3],
...         [4, 5, 6],
...         [7, 8, 9]]
...
>>> data[:]
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>>
>>> data[:, 1]
Traceback (most recent call last):
TypeError: list indices must be integers or slices, not tuple
>>>
>>> data[:][1]
[4, 5, 6]

However this syntax is valid in numpy and pandas.

4.7.15. Index Arithmetic

>>> text = 'We choose to go to the Moon!'
>>> first = 23
>>> last = 28
>>> step = 2
>>>
>>> text[first:last]
'Moon!'
>>> text[first:last-1]
'Moon'
>>> text[first:last:step]
'Mo!'
>>> text[first:last-1:step]
'Mo'

4.7.16. Slice Function

  • Every n-th element

  • sequence[start:stop:step]

  • start defaults to 0

  • stop defaults to len(sequence)

  • step defaults to 1

>>> text = 'We choose to go to the Moon!'
>>>
>>> q = slice(23, 27)
>>> text[q]
'Moon'
>>>
>>> q = slice(None, 9)
>>> text[q]
'We choose'
>>>
>>> q = slice(23, None)
>>> text[q]
'Moon!'
>>>
>>> q = slice(23, None, 2)
>>> text[q]
'Mo!'
>>>
>>> q = slice(None, None, 2)
>>> text[q]
'W hoet ot h on'

4.7.17. Use Case - 0x01

>>> from pprint import pprint
>>>
>>>
>>> DATA = [
...     ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...     (5.8, 2.7, 5.1, 1.9, 'virginica'),
...     (5.1, 3.5, 1.4, 0.2, 'setosa'),
...     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...     (6.3, 2.9, 5.6, 1.8, 'virginica'),
...     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...     (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>>
>>> pprint(DATA[1:])
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
 (5.1, 3.5, 1.4, 0.2, 'setosa'),
 (5.7, 2.8, 4.1, 1.3, 'versicolor'),
 (6.3, 2.9, 5.6, 1.8, 'virginica'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor'),
 (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>> pprint(DATA[1::2])
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
 (5.7, 2.8, 4.1, 1.3, 'versicolor'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor')]
>>>
>>> pprint(DATA[1::-2])
[(5.8, 2.7, 5.1, 1.9, 'virginica')]
>>>
>>> pprint(DATA[:1:-2])
[(4.7, 3.2, 1.3, 0.2, 'setosa'),
 (6.3, 2.9, 5.6, 1.8, 'virginica'),
 (5.1, 3.5, 1.4, 0.2, 'setosa')]
>>>
>>> pprint(DATA[:-5:-2])
[(4.7, 3.2, 1.3, 0.2, 'setosa'), (6.3, 2.9, 5.6, 1.8, 'virginica')]
>>>
>>> pprint(DATA[1:-5:-2])
[]

4.7.18. Use Case - 0x02

>>> data = [[1, 2, 3],
...         [4, 5, 6],
...         [7, 8, 9]]
...
>>> data[::2]  
[[1, 2, 3],
 [7, 8, 9]]
>>>
>>> data[::2][1]
[7, 8, 9]
>>>
>>> data[::2][:1]
[[1, 2, 3]]
>>>
>>> data[::2][1][1:]
[8, 9]

4.7.19. Use Case - 0x03

>>> text = 'We choose to go to the Moon!'
>>> word = 'Moon'
>>>
>>>
>>> start = text.find(word)
>>> stop = start + len(word)
>>>
>>> text[start:stop]
'Moon'
>>>
>>> text[:start]
'We choose to go to the '
>>>
>>> text[stop:]
'!'
>>>
>>> text[:start] + text[stop:]
'We choose to go to the !'

4.7.20. Assignments

Code 4.15. Solution
"""
* Assignment: Sequence Slice Text
* Required: yes
* Complexity: easy
* Lines of code: 8 lines
* Time: 8 min

English:
    1. Remove title and military rank in each variable
    2. Remove also whitespaces at the beginning and end of a text
    3. Use only `slice` to clean text
    4. Run doctests - all must succeed

Polish:
    1. Usuń tytuł naukowy i stopień wojskowy z każdej zmiennej
    2. Usuń również białe znaki na początku i końcu tekstu
    3. Użyj tylko `slice` do oczyszczenia tekstu
    4. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert a is not Ellipsis, \
    'Assign result to variable: `a`'
    >>> assert b is not Ellipsis, \
    'Assign result to variable: `b`'
    >>> assert c is not Ellipsis, \
    'Assign result to variable: `c`'
    >>> assert d is not Ellipsis, \
    'Assign result to variable: `d`'
    >>> assert e is not Ellipsis, \
    'Assign result to variable: `e`'
    >>> assert f is not Ellipsis, \
    'Assign result to variable: `f`'
    >>> assert g is not Ellipsis, \
    'Assign result to variable: `g`'
    >>> assert type(a) is str, \
    'Variable `a` has invalid type, should be str'
    >>> assert type(b) is str, \
    'Variable `b` has invalid type, should be str'
    >>> assert type(c) is str, \
    'Variable `c` has invalid type, should be str'
    >>> assert type(d) is str, \
    'Variable `d` has invalid type, should be str'
    >>> assert type(e) is str, \
    'Variable `e` has invalid type, should be str'
    >>> assert type(f) is str, \
    'Variable `f` has invalid type, should be str'
    >>> assert type(g) is str, \
    'Variable `g` has invalid type, should be str'

    >>> example
    'Mark Watney'
    >>> a
    'Jan Twardowski'
    >>> b
    'Jan Twardowski'
    >>> c
    'Mark Watney'
    >>> d
    'Melissa Lewis'
    >>> e
    'Ryan Stone'
    >>> f
    'Ryan Stone'
    >>> g
    'Jan Twardowski'
"""

EXAMPLE = 'lt. Mark Watney, PhD'
A = 'dr hab. inż. Jan Twardowski, prof. AATC'
B = 'gen. pil. Jan Twardowski'
C = 'Mark Watney, PhD'
D = 'lt. col. ret. Melissa Lewis'
E = 'dr n. med. Ryan Stone'
F = 'Ryan Stone, MD-PhD'
G = 'lt. col. Jan Twardowski\t'

example = EXAMPLE[4:-5]

# str: expected result: 'Jan Twardowski'
a = ...

# str: expected result: 'Jan Twardowski'
b = ...

# str: expected result: 'Mark Watney'
c = ...

# str: expected result: 'Melissa Lewis'
d = ...

# str: expected result: 'Ryan Stone'
e = ...

# str: expected result: 'Ryan Stone'
f = ...

# str: expected result: 'Jan Twardowski'
g = ...

Code 4.16. Solution
"""
* Assignment: Sequence Slice Substr
* Required: yes
* Complexity: easy
* Lines of code: 3 lines
* Time: 5 min

English:
    1. Use `str.find()` and slicing
    2. Print `TEXT` without fragment from `REMOVE`
    3. Output should be: 'We choose the Moon!'
    4. Do not use `str.replace()`
    5. Run doctests - all must succeed

Polish:
    1. Użyj `str.find()` oraz wycinania
    2. Wypisz `TEXT` bez fragmentu znajdującego się w `REMOVE`
    3. Wynik powinien być: 'We choose the Moon!'
    4. Nie używaj `str.replace()`
    5. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> assert type(result) is str, \
    'Variable `result` has invalid type, should be str'

    >>> result
    'We choose the Moon!'
"""

TEXT = 'We choose to go to the Moon!'
REMOVE = 'to go to '

# str: TEXT without REMOVE part
result = ...

Code 4.17. Solution
"""
* Assignment: Sequence Slice Sequence
* Required: yes
* Complexity: easy
* Lines of code: 2 lines
* Time: 3 min

English:
    1. Create set `result` with every second element from `a` and `b`
    2. Run doctests - all must succeed

Polish:
    1. Stwórz zbiór `result` z co drugim elementem `a` i `b`
    2. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> assert type(result) is set, \
    'Variable `result` has invalid type, should be set'

    >>> result
    {0, 2, 4}
"""

a = (0, 1, 2, 3)
b = [2, 3, 4, 5]

# set[int]: with every second element from `a` and `b`
result = ...

Code 4.18. Solution
"""
* Assignment: Sequence Slice Header/Data
* Required: yes
* Complexity: easy
* Lines of code: 2 lines
* Time: 3 min

English:
    1. Separate header (first line) from data:
       a. Define `header: tuple[str]` with header
       b. Define `data: list[tuple]` with other data without header
    2. Run doctests - all must succeed

Polish:
    1. Odseparuj nagłówek (pierwsza linia) od danych:
       a. Zdefiniuj `header: tuple[str]` z nagłówkiem
       b. Zdefiniuj `data: list[tuple]` z danymi bez nagłówka
    2. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert header is not Ellipsis, \
    'Assign result to variable: `header`'
    >>> assert data is not Ellipsis, \
    'Assign result to variable: `data`'
    >>> assert type(header) is tuple, \
    'Variable `header` has invalid type, should be tuple'
    >>> assert all(type(x) is tuple for x in data), \
    'All elements in `data` should be tuple'
    >>> assert header not in data, \
    'Header should not be in `data`'

    >>> header
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species')

    >>> data  # doctest: +NORMALIZE_WHITESPACE
    [(5.8, 2.7, 5.1, 1.9, 'virginica'),
     (5.1, 3.5, 1.4, 0.2, 'setosa'),
     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
     (6.3, 2.9, 5.6, 1.8, 'virginica'),
     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
     (4.7, 3.2, 1.3, 0.2, 'setosa'),
     (7.0, 3.2, 4.7, 1.4, 'versicolor'),
     (7.6, 3.0, 6.6, 2.1, 'virginica'),
     (4.9, 3.0, 1.4, 0.2, 'setosa'),
     (4.9, 2.5, 4.5, 1.7, 'virginica')]
"""

DATA = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (7.6, 3.0, 6.6, 2.1, 'virginica'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (4.9, 2.5, 4.5, 1.7, 'virginica')]


# tuple[str]: with row at index 0 from DATA
header = ...

# list[tuple]: with rows at all the other indexes from DATA
data = ...

Code 4.19. Solution
"""
* Assignment: Sequence Slice Train/Test
* Required: yes
* Complexity: easy
* Lines of code: 4 lines
* Time: 8 min

English:
    1. Divide `data` into two lists:
       a. `train`: 60% - training data
       b. `test`: 40% - testing data
    2. Calculate split point:
       a. `data` length multiplied by percent
       b. From `data` slice training data from start to split
       c. From `data` slice test data from split to end
    3. Run doctests - all must succeed

Polish:
    1. Podziel `data` na dwie listy:
       a. `train`: 60% - dane do uczenia
       b. `test`: 40% - dane do testów
    2. Aby to zrobić wylicz punkt podziału:
       a. Długość `data` razy procent
       c. Z `data` wytnij do uczenia rekordy od początku do punktu podziału
       d. Z `data` zapisz do testów rekordy od punktu podziału do końca
    3. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert split is not Ellipsis, \
    'Assign result to variable: `split`'
    >>> assert train is not Ellipsis, \
    'Assign result to variable: `train`'
    >>> assert test is not Ellipsis, \
    'Assign result to variable: `test`'
    >>> assert type(split) is int, \
    'Variable `split` has invalid type, should be int'
    >>> assert type(train) is list, \
    'Variable `train` has invalid type, should be list'
    >>> assert type(train) is list, \
    'Variable `train` has invalid type, should be list'
    >>> assert type(test) is list, \
    'Variable `test` has invalid type, should be list'
    >>> assert all(type(x) is tuple for x in train), \
    'All elements in `train` should be tuple'
    >>> assert all(type(x) is tuple for x in test), \
    'All elements in `test` should be tuple'

    >>> split
    6

    >>> train  # doctest: +NORMALIZE_WHITESPACE
    [(5.8, 2.7, 5.1, 1.9, 'virginica'),
     (5.1, 3.5, 1.4, 0.2, 'setosa'),
     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
     (6.3, 2.9, 5.6, 1.8, 'virginica'),
     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
     (4.7, 3.2, 1.3, 0.2, 'setosa')]

    >>> test  # doctest: +NORMALIZE_WHITESPACE
    [(7.0, 3.2, 4.7, 1.4, 'versicolor'),
     (7.6, 3.0, 6.6, 2.1, 'virginica'),
     (4.9, 3.0, 1.4, 0.2, 'setosa'),
     (4.9, 2.5, 4.5, 1.7, 'virginica')]
"""

DATA = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (7.6, 3.0, 6.6, 2.1, 'virginica'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (4.9, 2.5, 4.5, 1.7, 'virginica')]


header = DATA[0]
data = DATA[1:]

# int: `data` length multiplied by percent
split = ...

# list[tuple]: first 60% from data
train = ...

# list[tuple]: last 40% from data
test = ...