Reputation: 76
I've a DataFrame which contains the data for different students going to school. It has different columns like rank
, major_code
, major
, unemployed
etc.
I used df.values
(in this case recent_grads.values
) to Return a numpy representation of the DataFrame.
recent_grads_np = recent_grads.values
print(recent_grads_np)
This works as d.values
changes the entire DataFrame into Numpy array. The result is this:
[[1 2419 'PETROLEUM ENGINEERING' ... 1534 364 193]
[2 2416 'MINING AND MINERAL ENGINEERING' ... 350 257 50]
[3 2415 'METALLURGICAL ENGINEERING' ... 456 176 0]
...
[172 5203 'COUNSELING PSYCHOLOGY' ... 2403 1245 308]
[173 3501 'LIBRARY SCIENCE' ... 288 338 192]]
How do I select only a few columns from the entire Dataframe and then convert into a numpy array ?
Upvotes: 5
Views: 7315
Reputation: 11
I know this is an old post, but I also needed something similar, and I found the following approach. You can also use the apply function with lambda. Thus, you can create your own type of array and even convert the data, like this:
raw_converted_data = df[["A", "B", "C"]].apply(lambda x: [x["A"], x["B"], x["C"]], axis=1)
list(raw_converted_data)
In this example, the first line will return the data separated into an array. Then, in the second line, this data becomes a list.
I hope this answer helps someone else also looking for this.
Upvotes: 0
Reputation: 51
You should easily use ".values" at the end of your Dataframe and it will give you your numpy array.
To select specific dataframe columns, you can say df[["A","B","C"]] where "A", "B", and "C" are your columns names.
So: df[["A","B","C"]].values
will give you what you asked for.
Upvotes: 4