braaannigan
braaannigan

Reputation: 874

In Polars how can I display a single row from a dataframe vertically like a pandas series?

I have a polars dataframe with many columns. I want to look at all the data from a single row aligned vertically so that I can see the values in many different columns without it going off the edge of the screen. How can I do this?

E.g. define a dataframe

df = pl.DataFrame({'a':[0,1],'b':[2,3]})

Print df[0] in ipython/jupyter and I get:

Output from a single row of the dataframe

But if I convert df to pandas and print df.iloc[0] I get:

Output from pandas

The latter is very handy when you've got many columns.

I've tried things like df[0].to_series(), but it only prints the first element, not the first row.

My suspicion is that there isn't a direct replacement because the pandas method relies on the series having an index. I think the polars solution will be more like making a two column dataframe where one column is the column names and the other is a value. I'm not sure if there's a method to do that though.

Thanks for any help you can offer!

Upvotes: 5

Views: 11918

Answers (4)

ghuls
ghuls

Reputation: 476

import polars as pl
import numpy as np

# Create dataframe with lots of columns.
df = pl.DataFrame(np.random.randint(0, 1000, (5, 100)))

df
shape: (5, 8)
┌──────────┬──────────┬──────────┬──────────┬───────────┬───────────┬───────────┬───────────┐
│ column_0 ┆ column_1 ┆ column_2 ┆ column_3 ┆ column_96 ┆ column_97 ┆ column_98 ┆ column_99 │
│ ---      ┆ ---      ┆ ---      ┆ ---      ┆ ---       ┆ ---       ┆ ---       ┆ ---       │
│ i64      ┆ i64      ┆ i64      ┆ i64      ┆ i64       ┆ i64       ┆ i64       ┆ i64       │
╞══════════╪══════════╪══════════╪══════════╪═══════════╪═══════════╪═══════════╪═══════════╡
│ 285      ┆ 366      ┆ 886      ┆ 981      ┆ 63        ┆ 326       ┆ 882       ┆ 564       │
│ 735      ┆ 269      ┆ 381      ┆ 78       ┆ 556       ┆ 737       ┆ 741       ┆ 768       │
│ 543      ┆ 729      ┆ 915      ┆ 901      ┆ 48        ┆ 21        ┆ 277       ┆ 818       │
│ 264      ┆ 424      ┆ 285      ┆ 540      ┆ 602       ┆ 584       ┆ 888       ┆ 836       │
│ 269      ┆ 701      ┆ 483      ┆ 817      ┆ 579       ┆ 873       ┆ 192       ┆ 734       │
└──────────┴──────────┴──────────┴──────────┴───────────┴───────────┴───────────┴───────────┘
# Display row 3, by creating a tuple of column name and value for row 3.
tuple(zip(df.columns, df.row(2)))
(('column_0', 543),
 ('column_1', 729),
 ('column_2', 915),
 ('column_3', 901),
 ('column_4', 332),
 ('column_5', 156),
 ('column_6', 624),
 ('column_7', 37),
 ('column_8', 341),
 ('column_9', 503),
 ('column_10', 135),
 ('column_11', 183),
 ('column_12', 651),
 ('column_13', 910),
 ('column_14', 625),
 ('column_15', 129),
 ('column_16', 604),
 ('column_17', 671),
 ('column_18', 976),
 ('column_19', 558),
 ('column_20', 159),
 ('column_21', 314),
 ('column_22', 460),
 ('column_23', 49),
 ('column_24', 944),
 ('column_25', 6),
 ('column_26', 470),
 ('column_27', 228),
 ('column_28', 615),
 ('column_29', 230),
 ('column_30', 217),
 ('column_31', 66),
 ('column_32', 999),
 ('column_33', 440),
 ('column_34', 519),
 ('column_35', 851),
 ('column_36', 37),
 ('column_37', 859),
 ('column_38', 560),
 ('column_39', 870),
 ('column_40', 892),
 ('column_41', 192),
 ('column_42', 541),
 ('column_43', 136),
 ('column_44', 631),
 ('column_45', 22),
 ('column_46', 522),
 ('column_47', 225),
 ('column_48', 610),
 ('column_49', 191),
 ('column_50', 886),
 ('column_51', 454),
 ('column_52', 312),
 ('column_53', 956),
 ('column_54', 473),
 ('column_55', 851),
 ('column_56', 760),
 ('column_57', 224),
 ('column_58', 859),
 ('column_59', 442),
 ('column_60', 234),
 ('column_61', 788),
 ('column_62', 53),
 ('column_63', 999),
 ('column_64', 473),
 ('column_65', 237),
 ('column_66', 247),
 ('column_67', 307),
 ('column_68', 916),
 ('column_69', 94),
 ('column_70', 714),
 ('column_71', 233),
 ('column_72', 995),
 ('column_73', 335),
 ('column_74', 454),
 ('column_75', 801),
 ('column_76', 742),
 ('column_77', 386),
 ('column_78', 196),
 ('column_79', 239),
 ('column_80', 723),
 ('column_81', 59),
 ('column_82', 929),
 ('column_83', 852),
 ('column_84', 722),
 ('column_85', 328),
 ('column_86', 59),
 ('column_87', 710),
 ('column_88', 238),
 ('column_89', 823),
 ('column_90', 75),
 ('column_91', 307),
 ('column_92', 472),
 ('column_93', 822),
 ('column_94', 582),
 ('column_95', 802),
 ('column_96', 48),
 ('column_97', 21),
 ('column_98', 277),
 ('column_99', 818))

Pandas does not display all values either if you have many columns.

In [121]: df.to_pandas().iloc[0]
Out[121]: 
column_0     285
column_1     366
column_2     886
column_3     981
column_4     464
            ... 
column_95    862
column_96     63
column_97    326
column_98    882
column_99    564
Name: 0, Length: 100, dtype: int64

Upvotes: 2

user18559875
user18559875

Reputation:

You can try using unpivot. For example:

df = pl.DataFrame(
    [
        pl.Series(name="col_str", values=["string1", "string2"]),
        pl.Series(name="col_bool", values=[False, True]),
        pl.Series(name="col_int", values=[1, 2]),
        pl.Series(name="col_float", values=[10.0, 20.0]),
        *[pl.Series(name=f"col_other_{idx}", values=[idx] * 2)
          for idx in range(1, 25)],
    ]
)
print(df)
shape: (2, 28)
┌─────────┬──────────┬─────────┬───────────┬───┬──────────────┬──────────────┬──────────────┬──────────────┐
│ col_str ┆ col_bool ┆ col_int ┆ col_float ┆ … ┆ col_other_21 ┆ col_other_22 ┆ col_other_23 ┆ col_other_24 │
│ ---     ┆ ---      ┆ ---     ┆ ---       ┆   ┆ ---          ┆ ---          ┆ ---          ┆ ---          │
│ str     ┆ bool     ┆ i64     ┆ f64       ┆   ┆ i64          ┆ i64          ┆ i64          ┆ i64          │
╞═════════╪══════════╪═════════╪═══════════╪═══╪══════════════╪══════════════╪══════════════╪══════════════╡
│ string1 ┆ false    ┆ 1       ┆ 10.0      ┆ … ┆ 21           ┆ 22           ┆ 23           ┆ 24           │
│ string2 ┆ true     ┆ 2       ┆ 20.0      ┆ … ┆ 21           ┆ 22           ┆ 23           ┆ 24           │
└─────────┴──────────┴─────────┴───────────┴───┴──────────────┴──────────────┴──────────────┴──────────────┘

To print the first row:

pl.Config.set_tbl_rows(100)
df[0].unpivot()
┌──────────────┬─────────┐
│ variable     ┆ value   │
│ ---          ┆ ---     │
│ str          ┆ str     │
╞══════════════╪═════════╡
│ col_str      ┆ string1 │
│ col_bool     ┆ false   │
│ col_int      ┆ 1       │
│ col_float    ┆ 10.0    │
│ col_other_1  ┆ 1       │
│ col_other_2  ┆ 2       │
│ col_other_3  ┆ 3       │
│ col_other_4  ┆ 4       │
│ col_other_5  ┆ 5       │
│ col_other_6  ┆ 6       │
│ col_other_7  ┆ 7       │
│ col_other_8  ┆ 8       │
│ col_other_9  ┆ 9       │
│ col_other_10 ┆ 10      │
│ col_other_11 ┆ 11      │
│ col_other_12 ┆ 12      │
│ col_other_13 ┆ 13      │
│ col_other_14 ┆ 14      │
│ col_other_15 ┆ 15      │
│ col_other_16 ┆ 16      │
│ col_other_17 ┆ 17      │
│ col_other_18 ┆ 18      │
│ col_other_19 ┆ 19      │
│ col_other_20 ┆ 20      │
│ col_other_21 ┆ 21      │
│ col_other_22 ┆ 22      │
│ col_other_23 ┆ 23      │
│ col_other_24 ┆ 24      │
└──────────────┴─────────┘

If needed, set the polars.Config.set_tbl_rows option to the number of rows you find acceptable. (This only needs to be done once per session, not every time you print.)

Notice that all values have been cast to super-type str. (One caution: this approach won't work if any of your columns are of dtype list.)

Upvotes: 4

Hericks
Hericks

Reputation: 10039

This is the perfect use-case for pl.DataFrame.glimpse.

import polars as pl

df = pl.DataFrame({"a": [1], "b": [True], "c": ["Carbonara"]})

df[0].glimpse()
Rows: 1
Columns: 3
$ a  <i64> 1
$ b <bool> True
$ c  <str> 'Carbonara'

Upvotes: 6

benji
benji

Reputation: 196

You may try check Polars Cookbook about indexing here

It's stated that

| pandas     |    polars |
|------------|-----------|
| select row |           |
|df.iloc[2]  |  df[2, :] |

Cheers!

Upvotes: -1

Related Questions