# 4.7. Sequence Unpack Slice¶

## 4.7.1. Rationale¶

• Slice argument must be int (positive, negative or zero)

• Positive Index starts with 0

• Negative index starts with -1

## 4.7.2. Slice Forwards¶

• sequence[start:stop]

>>> data = 'abcde'
>>> data[0:3]
'abc'

>>> data = 'abcde'
>>> data[2:5]
'cde'


## 4.7.3. Slice Defaults¶

• sequence[start:stop]

• start defaults to 0

• stop defaults to len(sequence)

>>> data = 'abcde'
>>> data[:3]
'abc'

>>> data = 'abcde'
>>> data[3:]
'de'

>>> data = 'abcde'
>>> data[:]
'abcde'


## 4.7.4. Slice Backwards¶

• Negative index starts from the end and go right to left

>>> data = 'abcde'
>>> data[-3:-1]
'cd'

>>> data = 'abcde'
>>> data[-3:]
'cde'

>>> data = 'abcde'
>>> data[0:-3]
'ab'

>>> data = 'abcde'
>>> data[:-3]
'ab'

>>> data = 'abcde'
>>> data[-3:0]
''


## 4.7.5. Step Forward¶

• Every n-th element

• sequence[start:stop:step]

• start defaults to 0

• stop defaults to len(sequence)

• step defaults to 1

>>> data = 'abcde'
>>> data[::1]
'abcde'

>>> data = 'abcde'
>>> data[::2]
'ace'

>>> data = 'abcde'
>>> data[::3]

>>> data = 'abcde'
>>> data[1:4:2]
'bd'


## 4.7.6. Step Backward¶

• Every n-th element

• sequence[start:stop:step]

• start defaults to 0

• stop defaults to len(sequence)

• step defaults to 1

>>> data = 'abcde'
>>> data[::-1]
'edcba'

>>> data = 'abcde'
>>> data[::-2]
'eca'

>>> data = 'abcde'
>>> data[::-3]
'eb'

>>> data = 'abcde'
>>> data[4:1:-2]
'ec'


## 4.7.7. Slice Errors¶

>>> data = 'abcde'
>>> data[::0]
Traceback (most recent call last):
ValueError: slice step cannot be zero

>>> data = 'abcde'
>>> data[::1.0]
Traceback (most recent call last):
TypeError: slice indices must be integers or None or have an __index__ method


## 4.7.8. Out of Range¶

>>> data = 'abcde'
>>> data[:100]
'abcde'

>>> data = 'abcde'
>>> data[100:]
''


## 4.7.9. Slice str¶

>>> data = 'abcde'
>>>
>>>
>>> data[0:3]
'abc'
>>> data[3:5]
'de'
>>> data[:3]
'abc'
>>> data[3:]
'de'
>>> data[::1]
'abcde'
>>> data[::-1]
'edcba'
>>> data[::2]
'ace'
>>> data[::-2]
'eca'
>>> data[1::2]
'bd'
>>> data[1:4:2]
'bd'


## 4.7.10. Slice tuple¶

>>> data = ('a', 'b', 'c', 'd', 'e')
>>>
>>>
>>> data[0:3]
('a', 'b', 'c')
>>> data[3:5]
('d', 'e')
>>> data[:3]
('a', 'b', 'c')
>>> data[3:]
('d', 'e')
>>> data[::2]
('a', 'c', 'e')
>>> data[::-1]
('e', 'd', 'c', 'b', 'a')
>>> data[1::2]
('b', 'd')
>>> data[1:4:2]
('b', 'd')


## 4.7.11. Slice list¶

>>> data = ['a', 'b', 'c', 'd', 'e']
>>>
>>>
>>> data[0:3]
['a', 'b', 'c']
>>> data[3:5]
['d', 'e']
>>> data[:3]
['a', 'b', 'c']
>>> data[3:]
['d', 'e']
>>> data[::2]
['a', 'c', 'e']
>>> data[::-1]
['e', 'd', 'c', 'b', 'a']
>>> data[1::2]
['b', 'd']
>>> data[1:4:2]
['b', 'd']


## 4.7.12. Slice set¶

Slicing set is not possible:

>>> data = {'a', 'b', 'c', 'd', 'e'}
>>>
>>> data[:3]
Traceback (most recent call last):
TypeError: 'set' object is not subscriptable


## 4.7.13. Nested Sequences¶

>>> DATA = [
...     ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...     (5.8, 2.7, 5.1, 1.9, 'virginica'),
...     (5.1, 3.5, 1.4, 0.2, 'setosa'),
...     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...     (6.3, 2.9, 5.6, 1.8, 'virginica'),
...     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...     (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>>
>>> DATA[1:]
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>> DATA[-3:]
[(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa')]


## 4.7.14. Column Selection¶

Column selection unfortunately does not work on list:

>>> data = [[1, 2, 3],
...         [4, 5, 6],
...         [7, 8, 9]]
...
>>> data[:]
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>>
>>> data[:, 1]
Traceback (most recent call last):
TypeError: list indices must be integers or slices, not tuple
>>>
>>> data[:][1]
[4, 5, 6]


However this syntax is valid in numpy and pandas.

## 4.7.15. Index Arithmetic¶

>>> text = 'We choose to go to the Moon!'
>>> first = 23
>>> last = 28
>>> step = 2
>>>
>>> text[first:last]
'Moon!'
>>> text[first:last-1]
'Moon'
>>> text[first:last:step]
'Mo!'
>>> text[first:last-1:step]
'Mo'


## 4.7.16. Slice Function¶

• Every n-th element

• sequence[start:stop:step]

• start defaults to 0

• stop defaults to len(sequence)

• step defaults to 1

>>> text = 'We choose to go to the Moon!'
>>>
>>> q = slice(23, 27)
>>> text[q]
'Moon'
>>>
>>> q = slice(None, 9)
>>> text[q]
'We choose'
>>>
>>> q = slice(23, None)
>>> text[q]
'Moon!'
>>>
>>> q = slice(23, None, 2)
>>> text[q]
'Mo!'
>>>
>>> q = slice(None, None, 2)
>>> text[q]
'W hoet ot h on'


## 4.7.17. Use Case - 0x01¶

>>> from pprint import pprint
>>>
>>>
>>> DATA = [
...     ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...     (5.8, 2.7, 5.1, 1.9, 'virginica'),
...     (5.1, 3.5, 1.4, 0.2, 'setosa'),
...     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...     (6.3, 2.9, 5.6, 1.8, 'virginica'),
...     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...     (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>>
>>> pprint(DATA[1:])
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>> pprint(DATA[1::2])
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.4, 3.2, 4.5, 1.5, 'versicolor')]
>>>
>>> pprint(DATA[1::-2])
[(5.8, 2.7, 5.1, 1.9, 'virginica')]
>>>
>>> pprint(DATA[:1:-2])
[(4.7, 3.2, 1.3, 0.2, 'setosa'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa')]
>>>
>>> pprint(DATA[:-5:-2])
[(4.7, 3.2, 1.3, 0.2, 'setosa'), (6.3, 2.9, 5.6, 1.8, 'virginica')]
>>>
>>> pprint(DATA[1:-5:-2])
[]


## 4.7.18. Use Case - 0x02¶

>>> data = [[1, 2, 3],
...         [4, 5, 6],
...         [7, 8, 9]]
...
>>> data[::2]
[[1, 2, 3],
[7, 8, 9]]
>>>
>>> data[::2][1]
[7, 8, 9]
>>>
>>> data[::2][:1]
[[1, 2, 3]]
>>>
>>> data[::2][1][1:]
[8, 9]


## 4.7.19. Use Case - 0x03¶

>>> text = 'We choose to go to the Moon!'
>>> word = 'Moon'
>>>
>>>
>>> start = text.find(word)
>>> stop = start + len(word)
>>>
>>> text[start:stop]
'Moon'
>>>
>>> text[:start]
'We choose to go to the '
>>>
>>> text[stop:]
'!'
>>>
>>> text[:start] + text[stop:]
'We choose to go to the !'


## 4.7.20. Assignments¶

"""
* Assignment: Sequence Slice Text
* Required: yes
* Complexity: easy
* Lines of code: 8 lines
* Time: 8 min

English:
1. Remove title and military rank in each variable
2. Remove also whitespaces at the beginning and end of a text
3. Use only slice to clean text
4. Run doctests - all must succeed

Polish:
1. Usuń tytuł naukowy i stopień wojskowy z każdej zmiennej
2. Usuń również białe znaki na początku i końcu tekstu
3. Użyj tylko slice do oczyszczenia tekstu
4. Uruchom doctesty - wszystkie muszą się powieść

Tests:
>>> import sys; sys.tracebacklimit = 0

>>> assert a is not Ellipsis, \
'Assign result to variable: a'
>>> assert b is not Ellipsis, \
'Assign result to variable: b'
>>> assert c is not Ellipsis, \
'Assign result to variable: c'
>>> assert d is not Ellipsis, \
'Assign result to variable: d'
>>> assert e is not Ellipsis, \
'Assign result to variable: e'
>>> assert f is not Ellipsis, \
'Assign result to variable: f'
>>> assert g is not Ellipsis, \
'Assign result to variable: g'
>>> assert type(a) is str, \
'Variable a has invalid type, should be str'
>>> assert type(b) is str, \
'Variable b has invalid type, should be str'
>>> assert type(c) is str, \
'Variable c has invalid type, should be str'
>>> assert type(d) is str, \
'Variable d has invalid type, should be str'
>>> assert type(e) is str, \
'Variable e has invalid type, should be str'
>>> assert type(f) is str, \
'Variable f has invalid type, should be str'
>>> assert type(g) is str, \
'Variable g has invalid type, should be str'

>>> example
'Mark Watney'
>>> a
'Jan Twardowski'
>>> b
'Jan Twardowski'
>>> c
'Mark Watney'
>>> d
'Melissa Lewis'
>>> e
'Ryan Stone'
>>> f
'Ryan Stone'
>>> g
'Jan Twardowski'
"""

EXAMPLE = 'lt. Mark Watney, PhD'
A = 'dr hab. inż. Jan Twardowski, prof. AATC'
B = 'gen. pil. Jan Twardowski'
C = 'Mark Watney, PhD'
D = 'lt. col. ret. Melissa Lewis'
E = 'dr n. med. Ryan Stone'
F = 'Ryan Stone, MD-PhD'
G = 'lt. col. Jan Twardowski\t'

example = EXAMPLE[4:-5]

# str: expected result: 'Jan Twardowski'
a = ...

# str: expected result: 'Jan Twardowski'
b = ...

# str: expected result: 'Mark Watney'
c = ...

# str: expected result: 'Melissa Lewis'
d = ...

# str: expected result: 'Ryan Stone'
e = ...

# str: expected result: 'Ryan Stone'
f = ...

# str: expected result: 'Jan Twardowski'
g = ...


"""
* Assignment: Sequence Slice Substr
* Required: yes
* Complexity: easy
* Lines of code: 3 lines
* Time: 5 min

English:
1. Use str.find() and slicing
2. Print TEXT without fragment from REMOVE
3. Output should be: 'We choose the Moon!'
4. Do not use str.replace()
5. Run doctests - all must succeed

Polish:
1. Użyj str.find() oraz wycinania
2. Wypisz TEXT bez fragmentu znajdującego się w REMOVE
3. Wynik powinien być: 'We choose the Moon!'
4. Nie używaj str.replace()
5. Uruchom doctesty - wszystkie muszą się powieść

Tests:
>>> import sys; sys.tracebacklimit = 0

>>> assert result is not Ellipsis, \
'Assign result to variable: result'
>>> assert type(result) is str, \
'Variable result has invalid type, should be str'

>>> result
'We choose the Moon!'
"""

TEXT = 'We choose to go to the Moon!'
REMOVE = 'to go to '

# str: TEXT without REMOVE part
result = ...


"""
* Assignment: Sequence Slice Sequence
* Required: yes
* Complexity: easy
* Lines of code: 2 lines
* Time: 3 min

English:
1. Create set result with every second element from a and b
2. Run doctests - all must succeed

Polish:
1. Stwórz zbiór result z co drugim elementem a i b
2. Uruchom doctesty - wszystkie muszą się powieść

Tests:
>>> import sys; sys.tracebacklimit = 0

>>> assert result is not Ellipsis, \
'Assign result to variable: result'
>>> assert type(result) is set, \
'Variable result has invalid type, should be set'

>>> result
{0, 2, 4}
"""

a = (0, 1, 2, 3)
b = [2, 3, 4, 5]

# set[int]: with every second element from a and b
result = ...


"""
* Required: yes
* Complexity: easy
* Lines of code: 2 lines
* Time: 3 min

English:
1. Separate header (first line) from data:
a. Define header: tuple[str] with header
b. Define data: list[tuple] with other data without header
2. Run doctests - all must succeed

Polish:
1. Odseparuj nagłówek (pierwsza linia) od danych:
a. Zdefiniuj header: tuple[str] z nagłówkiem
b. Zdefiniuj data: list[tuple] z danymi bez nagłówka
2. Uruchom doctesty - wszystkie muszą się powieść

Tests:
>>> import sys; sys.tracebacklimit = 0

>>> assert header is not Ellipsis, \
'Assign result to variable: header'
>>> assert data is not Ellipsis, \
'Assign result to variable: data'
>>> assert type(header) is tuple, \
'Variable header has invalid type, should be tuple'
>>> assert all(type(x) is tuple for x in data), \
'All elements in data should be tuple'
>>> assert header not in data, \
'Header should not be in data'

('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species')

>>> data  # doctest: +NORMALIZE_WHITESPACE
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa'),
(7.0, 3.2, 4.7, 1.4, 'versicolor'),
(7.6, 3.0, 6.6, 2.1, 'virginica'),
(4.9, 3.0, 1.4, 0.2, 'setosa'),
(4.9, 2.5, 4.5, 1.7, 'virginica')]
"""

DATA = [
('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa'),
(7.0, 3.2, 4.7, 1.4, 'versicolor'),
(7.6, 3.0, 6.6, 2.1, 'virginica'),
(4.9, 3.0, 1.4, 0.2, 'setosa'),
(4.9, 2.5, 4.5, 1.7, 'virginica')]

# tuple[str]: with row at index 0 from DATA

# list[tuple]: with rows at all the other indexes from DATA
data = ...


"""
* Assignment: Sequence Slice Train/Test
* Required: yes
* Complexity: easy
* Lines of code: 4 lines
* Time: 8 min

English:
1. Divide data into two lists:
a. train: 60% - training data
b. test: 40% - testing data
2. Calculate split point:
a. data length multiplied by percent
b. From data slice training data from start to split
c. From data slice test data from split to end
3. Run doctests - all must succeed

Polish:
1. Podziel data na dwie listy:
a. train: 60% - dane do uczenia
b. test: 40% - dane do testów
2. Aby to zrobić wylicz punkt podziału:
a. Długość data razy procent
c. Z data wytnij do uczenia rekordy od początku do punktu podziału
d. Z data zapisz do testów rekordy od punktu podziału do końca
3. Uruchom doctesty - wszystkie muszą się powieść

Tests:
>>> import sys; sys.tracebacklimit = 0

>>> assert split is not Ellipsis, \
'Assign result to variable: split'
>>> assert train is not Ellipsis, \
'Assign result to variable: train'
>>> assert test is not Ellipsis, \
'Assign result to variable: test'
>>> assert type(split) is int, \
'Variable split has invalid type, should be int'
>>> assert type(train) is list, \
'Variable train has invalid type, should be list'
>>> assert type(train) is list, \
'Variable train has invalid type, should be list'
>>> assert type(test) is list, \
'Variable test has invalid type, should be list'
>>> assert all(type(x) is tuple for x in train), \
'All elements in train should be tuple'
>>> assert all(type(x) is tuple for x in test), \
'All elements in test should be tuple'

>>> split
6

>>> train  # doctest: +NORMALIZE_WHITESPACE
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa')]

>>> test  # doctest: +NORMALIZE_WHITESPACE
[(7.0, 3.2, 4.7, 1.4, 'versicolor'),
(7.6, 3.0, 6.6, 2.1, 'virginica'),
(4.9, 3.0, 1.4, 0.2, 'setosa'),
(4.9, 2.5, 4.5, 1.7, 'virginica')]
"""

DATA = [
('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa'),
(7.0, 3.2, 4.7, 1.4, 'versicolor'),
(7.6, 3.0, 6.6, 2.1, 'virginica'),
(4.9, 3.0, 1.4, 0.2, 'setosa'),
(4.9, 2.5, 4.5, 1.7, 'virginica')]

# int: data length multiplied by percent