最近在学习spark,本来应该是使用scala编程,但是无奈scala没接触过,还得学,就先使用java的spark api练练手,其实发现java8的函数式编程跟scala很多地方异曲同工啊,搞定spark的java api后面学scala应该事半功倍!
最开始当然是万年不变的wordcount,加了个排序输出,具体看注释(^o^)/
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
| SparkConf conf = new SparkConf().setMaster("local").setAppName("word count"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> lines = sc.textFile("xxx.txt");
JavaPairRDD<String, Integer> counts = lines .flatMap(line -> Arrays.asList(line.split("\\001")).iterator()) .mapToPair(w -> new Tuple2<String, Integer>(w, 1)) .reduceByKey((x, y) -> x + y);
counts .mapToPair(s -> new Tuple2<Integer, String>(s._2, s._1)) .sortByKey(false) .mapToPair(s -> new Tuple2<String, Integer>(s._2, s._1)) .collect() .forEach(tuple -> System.out.println(tuple._1() + ": " + tuple._2()));
|