Reputation: 65
If I have an R data.frame df
and
colnames(df)
,
[1] "a" "b" "c" "d" "e"
I can select columns "a", "c", "d" and "e" as follow:
df[ , c(1, 3:5)]
There is a simple equivalent in pandas? I know I can use
df.loc[:, ['a', 'c', 'd', 'e']]
and this is fine for few columns.
For many sequences of columns, R code is still straightforward
df2[ , c(1:10, 25:30, 40, 50:100)]
Upvotes: 4
Views: 1013
Reputation: 3825
Now you can use similar syntax in python:
>>> from datar.all import c, f, select
>>> from datar.datasets import starwars
>>>
>>> starwars
name height mass hair_color skin_color eye_color birth_year sex gender homeworld species
<object> <float64> <float64> <object> <object> <object> <float64> <object> <object> <object> <object>
0 Luke Skywalker 172.0 77.0 blond fair blue 19.0 male masculine Tatooine Human
1 C-3PO 167.0 75.0 NaN gold yellow 112.0 none masculine Tatooine Droid
2 R2-D2 96.0 32.0 NaN white, blue red 33.0 none masculine Naboo Droid
3 Darth Vader 202.0 136.0 none white yellow 41.9 male masculine Tatooine Human
.. ... ... ... ... ... ... ... ... ... ... ...
4 Leia Organa 150.0 49.0 brown light brown 19.0 female feminine Alderaan Human
82 Rey NaN NaN brown light hazel NaN female feminine NaN Human
83 Poe Dameron NaN NaN brown light brown NaN male masculine NaN Human
84 BB8 NaN NaN none none black NaN none masculine NaN Droid
85 Captain Phasma NaN NaN unknown unknown unknown NaN NaN NaN NaN NaN
86 Padmé Amidala 165.0 45.0 brown light brown 46.0 female feminine Naboo Human
[87 rows x 11 columns]
>>>
>>> starwars >> select(c(1, f[3:5], 7))
name mass hair_color skin_color birth_year
<object> <float64> <object> <object> <float64>
0 Luke Skywalker 77.0 blond fair 19.0
1 C-3PO 75.0 NaN gold 112.0
2 R2-D2 32.0 NaN white, blue 33.0
3 Darth Vader 136.0 none white 41.9
.. ... ... ... ... ...
4 Leia Organa 49.0 brown light 19.0
82 Rey NaN brown light NaN
83 Poe Dameron NaN brown light NaN
84 BB8 NaN none none NaN
85 Captain Phasma NaN unknown unknown NaN
86 Padmé Amidala 45.0 brown light 46.0
[87 rows x 5 columns]
>>>
>>> # even with column names
>>> starwars >> select(c(f.name, f[f.mass:f.skin_color], f.birth_year))
name mass hair_color skin_color birth_year
<object> <float64> <object> <object> <float64>
0 Luke Skywalker 77.0 blond fair 19.0
1 C-3PO 75.0 NaN gold 112.0
2 R2-D2 32.0 NaN white, blue 33.0
3 Darth Vader 136.0 none white 41.9
.. ... ... ... ... ...
4 Leia Organa 49.0 brown light 19.0
82 Rey NaN brown light NaN
83 Poe Dameron NaN brown light NaN
84 BB8 NaN none none NaN
85 Captain Phasma NaN unknown unknown NaN
86 Padmé Amidala 45.0 brown light 46.0
[87 rows x 5 columns]
I am the author of the datar
package. Feel free to submit issues if you have any questions.
Upvotes: 0
Reputation: 25662
UPDATE: No need to use numpy.hstack
, you can just call numpy.r_
as below
Use iloc
+ numpy.r_
:
In [20]: df = DataFrame(randn(10, 3), columns=list('abc'))
In [21]: df
Out[21]:
a b c
0 0.228163 -1.311485 -1.335604
1 0.292547 -1.636901 0.001765
2 0.744605 -0.325580 0.205003
3 -0.580471 -0.531553 -0.740697
4 0.250574 1.076019 -0.594915
5 -0.148449 0.076951 -0.653595
6 -1.065314 -0.166018 -1.471532
7 1.133336 -0.529738 -1.213841
8 -1.715281 -2.058831 0.113237
9 -0.382412 -0.072540 0.294853
[10 rows x 3 columns]
In [22]: df.iloc[:, r_[:2]]
Out[22]:
a b
0 0.228163 -1.311485
1 0.292547 -1.636901
2 0.744605 -0.325580
3 -0.580471 -0.531553
4 0.250574 1.076019
5 -0.148449 0.076951
6 -1.065314 -0.166018
7 1.133336 -0.529738
8 -1.715281 -2.058831
9 -0.382412 -0.072540
[10 rows x 2 columns]
To concatenate integer ranges use numpy.r_
:
In [35]: df = DataFrame(randn(10, 6), columns=list('abcdef'))
In [36]: df.iloc[:, r_[:2, 2:df.columns.size:2]]
Out[36]:
a b c e
0 -1.358623 -0.622909 0.025609 -1.166303
1 0.527027 0.310530 2.892384 0.190451
2 -0.251138 -1.246113 0.738264 0.062078
3 -1.716028 0.419139 0.060225 -1.191527
4 -1.308635 0.045396 -0.599367 -0.202491
5 -0.620343 0.796364 -0.008802 0.160020
6 0.199739 0.111816 -0.278119 1.051317
7 -0.311206 0.090348 -0.237887 0.958215
8 0.363161 2.449031 1.023352 0.743853
9 0.039451 -0.855733 -0.836921 -0.835078
[10 rows x 4 columns]
Upvotes: 8