spark与pandas 如何构建分类模型
本章通过sklearn,鸢尾花数据,通过pandas的DataFrame与spark的DataFrame之间转化,构建spark多分类模型,并且图调参以及得到最佳参数和评价分数。具体代码流程如下:
导入相关工具包
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19import numpy as np
import pandas as pd
import sklearn.datasets as sd
from pyspark.context import SparkContext
from pyspark.sql.context import SparkSession
from pyspark.sql import Row
from pyspark.ml.linalg import Vectors
from pyspark.mllib.regression import LabeledPoint
from pyspark.ml.tuning import ParamGridBuilder,CrossValidator
from pyspark.ml.classification import LogisticRegression
from pyspark.mllib.classification import LogisticRegressionWithLBFGS, LogisticRegressionModel
from pyspark.ml.evaluation import MulticlassClassificationEvaluator,BinaryClassificationEvaluator