About 44,100 results
Open links in new tab
  1. Apache Spark™ - Unified Engine for large-scale data analytics

    Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.

  2. Overview - Spark 4.0.0 Documentation

    If you’d like to build Spark from source, visit Building Spark. Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS), and it should run on any platform that runs a …

  3. Downloads - Apache Spark

    Spark docker images are available from Dockerhub under the accounts of both The Apache Software Foundation and Official Images. Note that, these images contain non-ASF software …

  4. Configuration - Spark 4.0.1 Documentation

    Spark provides three locations to configure the system: Spark properties control most application parameters and can be set by using a SparkConf object, or through Java system properties. …

  5. Spark Connect Overview - Spark 4.1.0-preview2 Documentation

    We will walk through how to run an Apache Spark server with Spark Connect and connect to it from a client application using the Spark Connect client library. Download and start Spark …

  6. Performance Tuning - Spark 4.0.1 Documentation

    Apache Spark’s ability to choose the best execution plan among many possible options is determined in part by its estimates of how many rows will be output by every node in the …

  7. Structured Streaming Programming Guide - Spark 4.0.1 …

    Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. You can express your streaming computation the same way you would express a …

  8. pyspark.sql.functions.datepart — PySpark 4.0.1 documentation

    >>> import datetime >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(datetime.datetime(2015, 4, 8, 13, 8, 15),)], ['ts']) >>> df.select( ... '*', …

  9. CSV Files - Spark 4.0.1 Documentation - Apache Spark

    Spark SQL provides spark.read().csv("file_name") to read a file or directory of files in CSV format into Spark DataFrame, and dataframe.write().csv("path") to write to a CSV file.

  10. Spark Release 4.0.0 - Apache Spark

    Apache Spark 4.0.0 marks a significant milestone as the inaugural release in the 4.x series, embodying the collective effort of the vibrant open-source community.