tommy.carstensen
tommy.carstensen

Reputation: 9622

How to get a value from a cell of a dataframe by position or label?

I read the questions "How to get a value from a cell of a dataframe?" and "How to select the last column of dataframe". I am a bit confused as to what is the accepted method in 2018 for getting a value from the cell of a pandas dataframe, as get_value has been deprecated and the documentation on at is somewhat sparse. I want to select the row by label and the column by position/integer. Is my method below in which I use iloc the accepted approach in 2018? I have this code:

import pandas as pd
import random
import string
import csv
import io

## Make a table with random column labels and rows in random order.
s = '\n'.join((
    ## Separate columns by comma.
    ','.join((
        ## Join letters without a separator.
        ''.join((
            ## Randomly choose 3 characters for the labels from ascii_letters.
            random.choice(
                string.ascii_letters) for i in range(3))) for j in range(3))),
    ## Insert a header to distinguish rows with identical index keys x and y.
    '"Header I, II and III"',
    ## Randomly shuffle the rows under the first header.
    '\n'.join(random.sample(('x,0,1', 'y,2,3',), 2)),
    ## Insert a header to distinguish rows with identical index keys x and y.
    '"Header IV, V and VI"',
    ## Randomly shuffle the rows under the second header.
    '\n'.join(random.sample(('x,4,5', 'y,6,7'), 2)),
    ))

def preprocess(s):
    header = ''
    for l in csv.reader(io.StringIO(s)):
        if len(l) == 1:
            header = l[0]
            continue
        ## Append the header to distinguish identical labels x and y.
        l[0] = header + '; ' + l[0]
        yield l

print(s, end='\n\n')
## Preprocess the string to avoid duplicate row index keys x and y.
df = pd.DataFrame(preprocess(s))
print(df, end='\n\n')
## Set the index to be that of the first column.
df = df.set_index(0)
## First select the column by index using iloc
## and then select the relevant row by index label.
value = df.iloc[:,-1]['Header I, II and III; x']
print(value)

It generates a string s looking like this:

YuT,Uva,AsE
"Header I, II and III"
y,2,3
x,0,1
"Header IV, V and VI"
y,6,7
x,4,5

Which is converted to a dataframe by the function preprocess looking like this:

                         0    1    2
0                    ; YuT  Uva  AsE
1  Header I, II and III; y    2    3
2  Header I, II and III; x    0    1
3   Header IV, V and VI; y    6    7
4   Header IV, V and VI; x    4    5

It is the value (the integer 1) from the last column of the row with the label Header I, II and III; x, which I'm interested in. Is this the correct approach in 2018?

value = df.iloc[:,-1]['Header I, II and III; x']

I just read some very interesting questions "Loc vs. iloc vs. ix vs. at vs. iat?" from February 2015 and "pandas iloc vs ix vs loc explanation?" from July 2015. It would be great, if I could do something like this, but I can't:

value = df.at['Header I, II and III; x', -1]
value = df['Header I, II and III; x'][-1]

Upvotes: 2

Views: 6461

Answers (2)

tommy.carstensen
tommy.carstensen

Reputation: 9622

This solution definitely works:

value = df.at['Header I, II and III; x', df.columns[-1]]

For those like myself that like old-fashioned dictionaries it is also possible to do the following, if the dataframe does not contain duplicate row labels, which prevents the transpose from working:

d = df.transpose().to_dict('list')
value = d['Header I, II and III; x'][-1]

Upvotes: 1

cs95
cs95

Reputation: 402523

First thing's first. ix is deprecated, but ix allowed you to mix labels and indexers, and made a lot of guesses about what it was passed.

In today's day and age (with the current stable release being v0.22), ix is deprecated, so stick to explicit label or positional based indexers: loc for label based slicing, iloc for index based slicing; at for label based item access, and iat for index based item access.

If you know what your labels are, use at to access a single item -

df.at['Header I, II and III; x', df.columns[-1]]

If you know the position, use iat -

df.iat[2, -1]

In general, use *at when you want to access a single element, and *loc when you want to access a row/column slice.

Upvotes: 3

Related Questions