Oswaldo
Oswaldo

Reputation: 65

pandas DataFrame: select a set of columns including a sequence of columns

If I have an R data.frame df and

colnames(df)
[1] "a" "b" "c" "d" "e"
,

I can select columns "a", "c", "d" and "e" as follow:

df[ , c(1, 3:5)]

There is a simple equivalent in pandas? I know I can use

df.loc[:, ['a', 'c', 'd', 'e']]

and this is fine for few columns.

For many sequences of columns, R code is still straightforward

df2[ , c(1:10, 25:30, 40, 50:100)]

Upvotes: 4

Views: 1013

Answers (2)

Panwen Wang
Panwen Wang

Reputation: 3825

Now you can use similar syntax in python:

>>> from datar.all import c, f, select
>>> from datar.datasets import starwars
>>> 
>>> starwars
              name    height      mass hair_color   skin_color eye_color  birth_year      sex     gender homeworld  species
          <object> <float64> <float64>   <object>     <object>  <object>   <float64> <object>   <object>  <object> <object>
0   Luke Skywalker     172.0      77.0      blond         fair      blue        19.0     male  masculine  Tatooine    Human
1            C-3PO     167.0      75.0        NaN         gold    yellow       112.0     none  masculine  Tatooine    Droid
2            R2-D2      96.0      32.0        NaN  white, blue       red        33.0     none  masculine     Naboo    Droid
3      Darth Vader     202.0     136.0       none        white    yellow        41.9     male  masculine  Tatooine    Human
..             ...       ...       ...        ...          ...       ...         ...      ...        ...       ...      ...
4      Leia Organa     150.0      49.0      brown        light     brown        19.0   female   feminine  Alderaan    Human
82             Rey       NaN       NaN      brown        light     hazel         NaN   female   feminine       NaN    Human
83     Poe Dameron       NaN       NaN      brown        light     brown         NaN     male  masculine       NaN    Human
84             BB8       NaN       NaN       none         none     black         NaN     none  masculine       NaN    Droid
85  Captain Phasma       NaN       NaN    unknown      unknown   unknown         NaN      NaN        NaN       NaN      NaN
86   Padmé Amidala     165.0      45.0      brown        light     brown        46.0   female   feminine     Naboo    Human

[87 rows x 11 columns]
>>> 
>>> starwars >> select(c(1, f[3:5], 7))
              name      mass hair_color   skin_color  birth_year
          <object> <float64>   <object>     <object>   <float64>
0   Luke Skywalker      77.0      blond         fair        19.0
1            C-3PO      75.0        NaN         gold       112.0
2            R2-D2      32.0        NaN  white, blue        33.0
3      Darth Vader     136.0       none        white        41.9
..             ...       ...        ...          ...         ...
4      Leia Organa      49.0      brown        light        19.0
82             Rey       NaN      brown        light         NaN
83     Poe Dameron       NaN      brown        light         NaN
84             BB8       NaN       none         none         NaN
85  Captain Phasma       NaN    unknown      unknown         NaN
86   Padmé Amidala      45.0      brown        light        46.0

[87 rows x 5 columns]
>>> 
>>> # even with column names
>>> starwars >> select(c(f.name, f[f.mass:f.skin_color], f.birth_year))
              name      mass hair_color   skin_color  birth_year
          <object> <float64>   <object>     <object>   <float64>
0   Luke Skywalker      77.0      blond         fair        19.0
1            C-3PO      75.0        NaN         gold       112.0
2            R2-D2      32.0        NaN  white, blue        33.0
3      Darth Vader     136.0       none        white        41.9
..             ...       ...        ...          ...         ...
4      Leia Organa      49.0      brown        light        19.0
82             Rey       NaN      brown        light         NaN
83     Poe Dameron       NaN      brown        light         NaN
84             BB8       NaN       none         none         NaN
85  Captain Phasma       NaN    unknown      unknown         NaN
86   Padmé Amidala      45.0      brown        light        46.0

[87 rows x 5 columns]

I am the author of the datar package. Feel free to submit issues if you have any questions.

Upvotes: 0

Phillip Cloud
Phillip Cloud

Reputation: 25662

UPDATE: No need to use numpy.hstack, you can just call numpy.r_ as below

Use iloc + numpy.r_:

In [20]: df = DataFrame(randn(10, 3), columns=list('abc'))

In [21]: df
Out[21]: 
          a         b         c
0  0.228163 -1.311485 -1.335604
1  0.292547 -1.636901  0.001765
2  0.744605 -0.325580  0.205003
3 -0.580471 -0.531553 -0.740697
4  0.250574  1.076019 -0.594915
5 -0.148449  0.076951 -0.653595
6 -1.065314 -0.166018 -1.471532
7  1.133336 -0.529738 -1.213841
8 -1.715281 -2.058831  0.113237
9 -0.382412 -0.072540  0.294853

[10 rows x 3 columns]

In [22]: df.iloc[:, r_[:2]]
Out[22]: 
          a         b
0  0.228163 -1.311485
1  0.292547 -1.636901
2  0.744605 -0.325580
3 -0.580471 -0.531553
4  0.250574  1.076019
5 -0.148449  0.076951
6 -1.065314 -0.166018
7  1.133336 -0.529738
8 -1.715281 -2.058831
9 -0.382412 -0.072540

[10 rows x 2 columns]

To concatenate integer ranges use numpy.r_:

In [35]: df = DataFrame(randn(10, 6), columns=list('abcdef'))

In [36]: df.iloc[:, r_[:2, 2:df.columns.size:2]]
Out[36]: 
          a         b         c         e
0 -1.358623 -0.622909  0.025609 -1.166303
1  0.527027  0.310530  2.892384  0.190451
2 -0.251138 -1.246113  0.738264  0.062078
3 -1.716028  0.419139  0.060225 -1.191527
4 -1.308635  0.045396 -0.599367 -0.202491
5 -0.620343  0.796364 -0.008802  0.160020
6  0.199739  0.111816 -0.278119  1.051317
7 -0.311206  0.090348 -0.237887  0.958215
8  0.363161  2.449031  1.023352  0.743853
9  0.039451 -0.855733 -0.836921 -0.835078

[10 rows x 4 columns]

Upvotes: 8

Related Questions